postgresnx
diff --git a/‎pgml-cms/blog/.gitbook/assets/Blog-Image_Evergreen-9.png
872 KB b/‎pgml-cms/blog/.gitbook/assets/Blog-Image_Evergreen-9.png
872 KB
diff --git a/‎pgml-cms/blog/.gitbook/assets/Blog-Image_RAG-Retrieval-Speed@2x.png
496 KB b/‎pgml-cms/blog/.gitbook/assets/Blog-Image_RAG-Retrieval-Speed@2x.png
496 KB
diff --git a/‎pgml-cms/blog/SUMMARY.md
Lines changed: 1 addition & 0 deletions b/‎pgml-cms/blog/SUMMARY.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎pgml-cms/blog/a-speed-comparison-of-the-most-popular-retrieval-systems-for-rag.md
Lines changed: 253 additions & 0 deletions b/‎pgml-cms/blog/a-speed-comparison-of-the-most-popular-retrieval-systems-for-rag.md
Lines changed: 253 additions & 0 deletions
@@ -1,6 +1,7 @@
 # Table of contents
 
 * [Home](README.md)
+* [A Speed Comparison of the Most Popular Retrieval Systems for RAG](a-speed-comparison-of-the-most-popular-retrieval-systems-for-rag.md)
 * [Korvus The All-in-One RAG Pipeline for PostgresML](introducing-korvus-the-all-in-one-rag-pipeline-for-postgresml.md)
 * [Semantic Search in Postgres in 15 Minutes](semantic-search-in-postgres-in-15-minutes.md)
 * [Unified RAG](unified-rag.md)
 
@@ -0,0 +1,253 @@
+---
+description: A hands-on test of the most popular retrieval systems for retrieval augmented generation (RAG).
+featured: true
+tags: [product]
+image: ".gitbook/assets/Blog-Image_Evergreen-9.png"
+---
+
+# A Speed Comparison of the Most Popular Retrieval Systems for RAG
+
+<div align="left">
+
+<figure><img src=".gitbook/assets/silas.jpg" alt="Author" width="100"><figcaption></figcaption></figure>
+
+</div>
+
+Silas Marvin
+
+July 30, 2024
+
+<figure><img src=".gitbook/assets/Blog-Image_RAG-Retrieval-Speed@2x.png" alt=""><figcaption><p>The average retreival speed for RAG in seconds.</p></figcaption></figure>
+
+## Methodology
+
+We tested a selection of the most popular retrieval systems for RAG:
+
+- Pinecone + HuggingFace
+- Qdrant + HuggingFace
+- Weaviate + HuggingFace
+- Zilliz + HuggingFace
+- PostgresML via Korvus
+
+!!! info
+
+Where are LangChain and LlamaIndex? Both LangChain and LlamIndex serve as orchestration layers. They aren't vector database providers or embedding providers and would only serve to make our Python script shorter (or longer depending on which framework we chose).
+
+!!!
+
+Each retrieval system is a vector database + embeddings API pair. To stay consistent, we used HuggingFace as the embeddings API for each vector database, but we could easily switch this for OpenAI or any other popular embeddings API. We first uploaded two documents to each database: one that has a hidden value we will query for later, and one filled with random text. We then tested a small RAG pipeline for each pair that simulated a user asking the question: "What is the hidden value", and getting a response generated by OpenAI.
+
+Pinecone, Qdrant, and Zilliz are only vector databases, so we first embed the query by manually making a request to HuggingFace's API. Then we performed a search over our uploaded documents, and passed the search result as context to OpenAI.
+
+Weaviate is a bit different. They embed and perform text generation for you. Note that we opted to use HuggingFace and OpenAI to stay consistent, which means Weaviate will make API calls to HuggingFace and OpenAI for us, essentially making Weaviate a wrapper around what we did for Pinecone, Qdrant, and Zilliz. 
+
+PostgresML is unique as it's not just a vector database, but a full PostgreSQL database with machine learning infrastructure built in. We didn't need to embed the query using an API, we embedded the user's question using SQL in our retrieval query, and passed the result from our search query as context to OpenAI.
+
+We used [a small Python script available here](https://github.com/postgresml/rag-timing-experiments) to test each RAG system. 
+
+## Benchmarks
+
+This is the direct output from our [Python script, which you can run yourself here](https://github.com/postgresml/rag-timing-experiments). These results are averaged over 25 trials. 
+
+```txt
+Done Doing RAG Test For: PostgresML
+- Average `Time to Embed`: 0.0000
+- Average `Time to Search`: 0.0643
+- Average `Total Time for Retrieval`: 0.0643
+- Average `Time for Chatbot Completion`: 0.6444
+- Average `Total Time Taken`: 0.7087
+
+Done Doing RAG Test For: Weaviate
+- Average `Time to Embed`: 0.0000
+- Average `Time to Search`: 0.0000
+- Average `Total Time for Retrieval`: 0.0000
+- Average `Time for Chatbot Completion`: 1.2539
+- Average `Total Time Taken`: 1.2539
+
+Done Doing RAG Test For: Zilliz
+- Average `Time to Embed`: 0.2938
+- Average `Time to Search`: 0.1565
+- Average `Total Time for Retrieval`: 0.4503
+- Average `Time for Chatbot Completion`: 0.5909
+- Average `Total Time Taken`: 1.0412
+
+Done Doing RAG Test For: Pinecone
+- Average `Time to Embed`: 0.2907
+- Average `Time to Search`: 0.2677
+- Average `Total Time for Retrieval`: 0.5584
+- Average `Time for Chatbot Completion`: 0.5949
+- Average `Total Time Taken`: 1.1533
+
+Done Doing RAG Test For: Qdrant
+- Average `Time to Embed`: 0.2901
+- Average `Time to Search`: 0.1674
+- Average `Total Time for Retrieval`: 0.4575
+- Average `Time for Chatbot Completion`: 0.6091
+- Average `Total Time Taken`: 1.0667
+```
+
+There are 5 metrics listed:
+
+1. The `Time for Embedding` is the time it takes to do the embedding. Note that it is zero for PostgresML and Weaviate. PostgresML does the embedding in the same query it does the search with, so there is no way to have a separate embedding time. Weaviate does the embedding, search, and generation all at once so it is zero here as well.
+2. The `Time for Search` is the time it takes to perform search over our vector database. In the case of PostgresML, this is the time it takes to embed and do the search in one SQL query. It is zero for Weaviate for reasons mentioned before.
+3. The `Total Time for Retrieval` is the total time it takes to do retrieval. It is the sum of the `Time for Embedding` and `Time for Search`.
+4. The `Time for Chatbot Completion` is the time it takes to get the response from OpenAI. In the case of Weaviate, this includes the Time for Retrieval.
+5. The `Total Time Taken` is the total time it takes to perform RAG.
+
+## Results
+
+There are a number of ways to interpret these results. First let's sort them by `Total Time Taken` ASC:
+
+1. PostgresML - 0.7087 `Total Time Taken`
+2. Zilliz - 1.0412 `Total Time Taken`
+3. Qdrant - 1.0667 `Total Time Taken`
+4. Pinecone - 1.1533 `Total Time Taken`
+5. Weaviate - 1.2539 `Total Time Taken`
+
+Let's remember that every single RAG system we tested uses OpenAI to perform the Augmented Generation part of RAG. This almost consistently takes about 0.6 seconds, and is part of the `Total Time Taken`. Because it is roughly constant, let's factor it out and focus on the `Total Time for Retrieval` (we omit Weaviate as we don't have metrics for that, but if we did factor the constant 0.6 seconds out of the total time it would be sitting at 0.6539):
+
+1. PostgresML - 0.0643 `Total Time for Retrieval`
+2. Zilliz - 0.4503 `Total Time for Retrieval`
+3. Qdrant - 0.4575 `Total Time for Retrieval`
+4. Pinecone - 0.5584 `Total Time for Retrieval`
+
+PostgresML is almost an order of magnitude faster at retrieval than any other system we tested, and it is clear why. Not only is the search itself faster (SQL queries with pgvector using an HNSW index are ridiculously fast), but PostgresML avoids the extra API call to embed the user's query. Because PostgresML can use embedding models in the database, it doesn't need to make an API call to embed.
+
+## Embedding directly in the database
+
+What does embedding look with SQL? For those new to SQL, it can be as easy as using our Korvus SDK with Python or JavaScript.
+
+{% tabs %}
+
+{% tab title="Korvus Python SDK" %}
+
+The Korvus Python SDK writes all the necessary SQL queries for us and gives us a high level abstraction for creating `Collections` and `Pipelines`, and searching and performing RAG.
+
+```python
+from korvus import Collection, Pipeline
+import asyncio
+
+collection = Collection("semantic-search-demo")
+pipeline = Pipeline(
+    "v1",
+    {
+        "text": {
+            "splitter": {"model": "recursive_character"},
+            "semantic_search": {
+                "model": "mixedbread-ai/mxbai-embed-large-v1",
+            },
+        },
+    },
+)
+
+
+async def main():
+    await collection.add_pipeline(pipeline)
+
+    documents = [
+        {
+            "id": "1",
+            "text": "The hidden value is 1000",
+        },
+        {
+            "id": "2",
+            "text": "Korvus is incredibly fast and easy to use.",
+        },
+    ]
+    await collection.upsert_documents(documents)
+
+    results = await collection.vector_search(
+        {
+            "query": {
+                "fields": {
+                    "text": {
+                        "query": "What is the hidden value",
+                        "parameters": {
+                            "prompt": "Represent this sentence for searching relevant passages: ",
+                        },
+                    },
+                },
+            },
+            "document": {"keys": ["id"]},
+            "limit": 1,
+        },
+        pipeline,
+    )
+    print(results)
+
+
+asyncio.run(main())
+```
+
+```txt
+[{'chunk': 'The hidden value is 1000', 'document': {'id': '1'}, 'rerank_score': None, 'score': 0.7257088435203306}]
+```
+
+{% endtab %}
+
+{% tab title="SQL" %}
+
+```postgresql
+SELECT pgml.embed(
+    transformer => 'mixedbread-ai/mxbai-embed-large-v1', 
+    text => 'What is the hidden value'
+) AS "embedding";
+```
+
+Using the pgml.embed function we can build out whole retrieval pipelines
+
+```postgresql
+-- Create a documents table
+CREATE TABLE documents (
+    id serial PRIMARY KEY,
+    text text NOT NULL,
+    embedding vector (384) -- Uses the vector data type from pgvector with dimension 384
+);
+
+-- Creates our HNSW index for super fast retreival
+CREATE INDEX documents_vector_idx ON documents USING hnsw (embedding vector_cosine_ops);
+
+-- Insert a few documents
+INSERT INTO documents (text, embedding)
+    VALUES ('The hidden value is 1000', (
+            SELECT pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'The hidden value is 1000'))),
+    ('This is just some random text',
+        (
+            SELECT pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'This is just some random text')));
+
+-- Do a query over it
+WITH "query_embedding" AS (
+    SELECT
+        pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'What is the hidden value', '{"prompt": "Represent this sentence for searching relevant passages: "}') AS "embedding"
+)
+SELECT
+    "text",
+    1 - (embedding <=> (
+            SELECT embedding
+            FROM "query_embedding")::vector) AS score
+FROM
+    documents
+ORDER BY
+    embedding <=> (
+        SELECT embedding
+        FROM "query_embedding")::vector ASC
+LIMIT 1;
+```
+
+```txt
+           text           |       score
+--------------------------+--------------------
+ The hidden value is 1000 | 0.9132997445285489
+```
+
+{% endtab %}
+
+{% endtabs %}
+
+Give it a spin, and let us know what you think. We're always here to geek out about databases and machine learning, so don't hesitate to reach out if you have any questions or ideas. We welcome you to: 
+
+- [Join our Discord server](https://discord.gg/DmyJP3qJ7U)
+- [Follow us on Twitter](https://twitter.com/postgresml)
+- [Contribute to the project on GitHub](https://github.com/postgresml/postgresml)
+
+Here's to simpler architectures and more powerful queries!