Skip to content

Commit fd1e3f8

Browse files
authored
Added new speed comparison blog post (postgresml#1596)
1 parent b9021fb commit fd1e3f8

File tree

4 files changed

+254
-0
lines changed

4 files changed

+254
-0
lines changed
Loading
Loading

pgml-cms/blog/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Table of contents
22

33
* [Home](README.md)
4+
* [A Speed Comparison of the Most Popular Retrieval Systems for RAG](a-speed-comparison-of-the-most-popular-retrieval-systems-for-rag.md)
45
* [Korvus The All-in-One RAG Pipeline for PostgresML](introducing-korvus-the-all-in-one-rag-pipeline-for-postgresml.md)
56
* [Semantic Search in Postgres in 15 Minutes](semantic-search-in-postgres-in-15-minutes.md)
67
* [Unified RAG](unified-rag.md)
Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
---
2+
description: A hands-on test of the most popular retrieval systems for retrieval augmented generation (RAG).
3+
featured: true
4+
tags: [product]
5+
image: ".gitbook/assets/Blog-Image_Evergreen-9.png"
6+
---
7+
8+
# A Speed Comparison of the Most Popular Retrieval Systems for RAG
9+
10+
<div align="left">
11+
12+
<figure><img src=".gitbook/assets/silas.jpg" alt="Author" width="100"><figcaption></figcaption></figure>
13+
14+
</div>
15+
16+
Silas Marvin
17+
18+
July 30, 2024
19+
20+
<figure><img src=".gitbook/assets/Blog-Image_RAG-Retrieval-Speed@2x.png" alt=""><figcaption><p>The average retreival speed for RAG in seconds.</p></figcaption></figure>
21+
22+
## Methodology
23+
24+
We tested a selection of the most popular retrieval systems for RAG:
25+
26+
- Pinecone + HuggingFace
27+
- Qdrant + HuggingFace
28+
- Weaviate + HuggingFace
29+
- Zilliz + HuggingFace
30+
- PostgresML via Korvus
31+
32+
!!! info
33+
34+
Where are LangChain and LlamaIndex? Both LangChain and LlamIndex serve as orchestration layers. They aren't vector database providers or embedding providers and would only serve to make our Python script shorter (or longer depending on which framework we chose).
35+
36+
!!!
37+
38+
Each retrieval system is a vector database + embeddings API pair. To stay consistent, we used HuggingFace as the embeddings API for each vector database, but we could easily switch this for OpenAI or any other popular embeddings API. We first uploaded two documents to each database: one that has a hidden value we will query for later, and one filled with random text. We then tested a small RAG pipeline for each pair that simulated a user asking the question: "What is the hidden value", and getting a response generated by OpenAI.
39+
40+
Pinecone, Qdrant, and Zilliz are only vector databases, so we first embed the query by manually making a request to HuggingFace's API. Then we performed a search over our uploaded documents, and passed the search result as context to OpenAI.
41+
42+
Weaviate is a bit different. They embed and perform text generation for you. Note that we opted to use HuggingFace and OpenAI to stay consistent, which means Weaviate will make API calls to HuggingFace and OpenAI for us, essentially making Weaviate a wrapper around what we did for Pinecone, Qdrant, and Zilliz.
43+
44+
PostgresML is unique as it's not just a vector database, but a full PostgreSQL database with machine learning infrastructure built in. We didn't need to embed the query using an API, we embedded the user's question using SQL in our retrieval query, and passed the result from our search query as context to OpenAI.
45+
46+
We used [a small Python script available here](https://github.com/postgresml/rag-timing-experiments) to test each RAG system.
47+
48+
## Benchmarks
49+
50+
This is the direct output from our [Python script, which you can run yourself here](https://github.com/postgresml/rag-timing-experiments). These results are averaged over 25 trials.
51+
52+
```txt
53+
Done Doing RAG Test For: PostgresML
54+
- Average `Time to Embed`: 0.0000
55+
- Average `Time to Search`: 0.0643
56+
- Average `Total Time for Retrieval`: 0.0643
57+
- Average `Time for Chatbot Completion`: 0.6444
58+
- Average `Total Time Taken`: 0.7087
59+
60+
Done Doing RAG Test For: Weaviate
61+
- Average `Time to Embed`: 0.0000
62+
- Average `Time to Search`: 0.0000
63+
- Average `Total Time for Retrieval`: 0.0000
64+
- Average `Time for Chatbot Completion`: 1.2539
65+
- Average `Total Time Taken`: 1.2539
66+
67+
Done Doing RAG Test For: Zilliz
68+
- Average `Time to Embed`: 0.2938
69+
- Average `Time to Search`: 0.1565
70+
- Average `Total Time for Retrieval`: 0.4503
71+
- Average `Time for Chatbot Completion`: 0.5909
72+
- Average `Total Time Taken`: 1.0412
73+
74+
Done Doing RAG Test For: Pinecone
75+
- Average `Time to Embed`: 0.2907
76+
- Average `Time to Search`: 0.2677
77+
- Average `Total Time for Retrieval`: 0.5584
78+
- Average `Time for Chatbot Completion`: 0.5949
79+
- Average `Total Time Taken`: 1.1533
80+
81+
Done Doing RAG Test For: Qdrant
82+
- Average `Time to Embed`: 0.2901
83+
- Average `Time to Search`: 0.1674
84+
- Average `Total Time for Retrieval`: 0.4575
85+
- Average `Time for Chatbot Completion`: 0.6091
86+
- Average `Total Time Taken`: 1.0667
87+
```
88+
89+
There are 5 metrics listed:
90+
91+
1. The `Time for Embedding` is the time it takes to do the embedding. Note that it is zero for PostgresML and Weaviate. PostgresML does the embedding in the same query it does the search with, so there is no way to have a separate embedding time. Weaviate does the embedding, search, and generation all at once so it is zero here as well.
92+
2. The `Time for Search` is the time it takes to perform search over our vector database. In the case of PostgresML, this is the time it takes to embed and do the search in one SQL query. It is zero for Weaviate for reasons mentioned before.
93+
3. The `Total Time for Retrieval` is the total time it takes to do retrieval. It is the sum of the `Time for Embedding` and `Time for Search`.
94+
4. The `Time for Chatbot Completion` is the time it takes to get the response from OpenAI. In the case of Weaviate, this includes the Time for Retrieval.
95+
5. The `Total Time Taken` is the total time it takes to perform RAG.
96+
97+
## Results
98+
99+
There are a number of ways to interpret these results. First let's sort them by `Total Time Taken` ASC:
100+
101+
1. PostgresML - 0.7087 `Total Time Taken`
102+
2. Zilliz - 1.0412 `Total Time Taken`
103+
3. Qdrant - 1.0667 `Total Time Taken`
104+
4. Pinecone - 1.1533 `Total Time Taken`
105+
5. Weaviate - 1.2539 `Total Time Taken`
106+
107+
Let's remember that every single RAG system we tested uses OpenAI to perform the Augmented Generation part of RAG. This almost consistently takes about 0.6 seconds, and is part of the `Total Time Taken`. Because it is roughly constant, let's factor it out and focus on the `Total Time for Retrieval` (we omit Weaviate as we don't have metrics for that, but if we did factor the constant 0.6 seconds out of the total time it would be sitting at 0.6539):
108+
109+
1. PostgresML - 0.0643 `Total Time for Retrieval`
110+
2. Zilliz - 0.4503 `Total Time for Retrieval`
111+
3. Qdrant - 0.4575 `Total Time for Retrieval`
112+
4. Pinecone - 0.5584 `Total Time for Retrieval`
113+
114+
PostgresML is almost an order of magnitude faster at retrieval than any other system we tested, and it is clear why. Not only is the search itself faster (SQL queries with pgvector using an HNSW index are ridiculously fast), but PostgresML avoids the extra API call to embed the user's query. Because PostgresML can use embedding models in the database, it doesn't need to make an API call to embed.
115+
116+
## Embedding directly in the database
117+
118+
What does embedding look with SQL? For those new to SQL, it can be as easy as using our Korvus SDK with Python or JavaScript.
119+
120+
{% tabs %}
121+
122+
{% tab title="Korvus Python SDK" %}
123+
124+
The Korvus Python SDK writes all the necessary SQL queries for us and gives us a high level abstraction for creating `Collections` and `Pipelines`, and searching and performing RAG.
125+
126+
```python
127+
from korvus import Collection, Pipeline
128+
import asyncio
129+
130+
collection = Collection("semantic-search-demo")
131+
pipeline = Pipeline(
132+
"v1",
133+
{
134+
"text": {
135+
"splitter": {"model": "recursive_character"},
136+
"semantic_search": {
137+
"model": "mixedbread-ai/mxbai-embed-large-v1",
138+
},
139+
},
140+
},
141+
)
142+
143+
144+
async def main():
145+
await collection.add_pipeline(pipeline)
146+
147+
documents = [
148+
{
149+
"id": "1",
150+
"text": "The hidden value is 1000",
151+
},
152+
{
153+
"id": "2",
154+
"text": "Korvus is incredibly fast and easy to use.",
155+
},
156+
]
157+
await collection.upsert_documents(documents)
158+
159+
results = await collection.vector_search(
160+
{
161+
"query": {
162+
"fields": {
163+
"text": {
164+
"query": "What is the hidden value",
165+
"parameters": {
166+
"prompt": "Represent this sentence for searching relevant passages: ",
167+
},
168+
},
169+
},
170+
},
171+
"document": {"keys": ["id"]},
172+
"limit": 1,
173+
},
174+
pipeline,
175+
)
176+
print(results)
177+
178+
179+
asyncio.run(main())
180+
```
181+
182+
```txt
183+
[{'chunk': 'The hidden value is 1000', 'document': {'id': '1'}, 'rerank_score': None, 'score': 0.7257088435203306}]
184+
```
185+
186+
{% endtab %}
187+
188+
{% tab title="SQL" %}
189+
190+
```postgresql
191+
SELECT pgml.embed(
192+
transformer => 'mixedbread-ai/mxbai-embed-large-v1',
193+
text => 'What is the hidden value'
194+
) AS "embedding";
195+
```
196+
197+
Using the pgml.embed function we can build out whole retrieval pipelines
198+
199+
```postgresql
200+
-- Create a documents table
201+
CREATE TABLE documents (
202+
id serial PRIMARY KEY,
203+
text text NOT NULL,
204+
embedding vector (384) -- Uses the vector data type from pgvector with dimension 384
205+
);
206+
207+
-- Creates our HNSW index for super fast retreival
208+
CREATE INDEX documents_vector_idx ON documents USING hnsw (embedding vector_cosine_ops);
209+
210+
-- Insert a few documents
211+
INSERT INTO documents (text, embedding)
212+
VALUES ('The hidden value is 1000', (
213+
SELECT pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'The hidden value is 1000'))),
214+
('This is just some random text',
215+
(
216+
SELECT pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'This is just some random text')));
217+
218+
-- Do a query over it
219+
WITH "query_embedding" AS (
220+
SELECT
221+
pgml.embed (transformer => 'mixedbread-ai/mxbai-embed-large-v1', text => 'What is the hidden value', '{"prompt": "Represent this sentence for searching relevant passages: "}') AS "embedding"
222+
)
223+
SELECT
224+
"text",
225+
1 - (embedding <=> (
226+
SELECT embedding
227+
FROM "query_embedding")::vector) AS score
228+
FROM
229+
documents
230+
ORDER BY
231+
embedding <=> (
232+
SELECT embedding
233+
FROM "query_embedding")::vector ASC
234+
LIMIT 1;
235+
```
236+
237+
```txt
238+
text | score
239+
--------------------------+--------------------
240+
The hidden value is 1000 | 0.9132997445285489
241+
```
242+
243+
{% endtab %}
244+
245+
{% endtabs %}
246+
247+
Give it a spin, and let us know what you think. We're always here to geek out about databases and machine learning, so don't hesitate to reach out if you have any questions or ideas. We welcome you to:
248+
249+
- [Join our Discord server](https://discord.gg/DmyJP3qJ7U)
250+
- [Follow us on Twitter](https://twitter.com/postgresml)
251+
- [Contribute to the project on GitHub](https://github.com/postgresml/postgresml)
252+
253+
Here's to simpler architectures and more powerful queries!

0 commit comments

Comments
 (0)