Skip to content

rename embedding and instruct models #1481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion packages/pgml-rds-proxy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ SELECT
FROM
dblink(
'postgresml',
'SELECT * FROM pgml.embed(''intfloat/e5-small'', ''embed this text'') AS embedding'
'SELECT * FROM pgml.embed(''intfloat/e5-small-v2'', ''embed this text'') AS embedding'
) AS t1(embedding real[386]);
```

Expand Down
7 changes: 3 additions & 4 deletions pgml-apps/pgml-chat/pgml_chat/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def handler(signum, frame):
"--chat_completion_model",
dest="chat_completion_model",
type=str,
default="HuggingFaceH4/zephyr-7b-beta",
default="meta-llama/Meta-Llama-3-8B-Instruct",
)

parser.add_argument(
Expand Down Expand Up @@ -195,9 +195,8 @@ def handler(signum, frame):
)

splitter = Splitter(splitter_name, splitter_params)
model_name = "hkunlp/instructor-xl"
model_embedding_instruction = "Represent the %s document for retrieval: " % (bot_topic)
model_params = {"instruction": model_embedding_instruction}
model_name = "intfloat/e5-small-v2"
model_params = {}

model = Model(model_name, "pgml", model_params)
pipeline = Pipeline(args.collection_name + "_pipeline", model, splitter)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,14 +122,14 @@ LIMIT 5;

PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. [`intfloat/e5-small`](https://huggingface.co/intfloat/e5-small) will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models.
Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. [`intfloat/e5-small-v2`](https://huggingface.co/intfloat/e5-small-v2) will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models.

It takes a couple of minutes to download and cache the `intfloat/e5-small` model to generate the first embedding. After that, it's pretty fast.
It takes a couple of minutes to download and cache the `intfloat/e5-small-v2` model to generate the first embedding. After that, it's pretty fast.

Note how we prefix the text we want to embed with either `passage:` or `query:` , the e5 model requires us to prefix our data with `passage:` if we're generating embeddings for our corpus and `query:` if we want to find semantically similar content.

```postgresql
SELECT pgml.embed('intfloat/e5-small', 'passage: hi mom');
SELECT pgml.embed('intfloat/e5-small-v2', 'passage: hi mom');
```

This is a pretty powerful function, because we can pass any arbitrary text to any open source model, and it will generate an embedding for us. We can benchmark how long it takes to generate an embedding for a single review, using client-side timings in Postgres:
Expand All @@ -147,7 +147,7 @@ Aside from using this function with strings passed from a client, we can use it
```postgresql
SELECT
review_body,
pgml.embed('intfloat/e5-small', 'passage: ' || review_body)
pgml.embed('intfloat/e5-small-v2', 'passage: ' || review_body)
FROM pgml.amazon_us_reviews
LIMIT 1;
```
Expand All @@ -171,7 +171,7 @@ Time to generate an embedding increases with the length of the input text, and v
```postgresql
SELECT
review_body,
pgml.embed('intfloat/e5-small', 'passage: ' || review_body) AS embedding
pgml.embed('intfloat/e5-small-v2', 'passage: ' || review_body) AS embedding
FROM pgml.amazon_us_reviews
LIMIT 1000;
```
Expand All @@ -190,7 +190,7 @@ We can also do a quick sanity check to make sure we're really getting value out
SELECT
reviqew_body,
pgml.embed(
'intfloat/e5-small',
'intfloat/e5-small-v2',
'passage: ' || review_body,
'{"device": "cpu"}'
) AS embedding
Expand Down Expand Up @@ -224,6 +224,12 @@ You can also find embedding models that outperform OpenAI's `text-embedding-ada-

The current leading model is `hkunlp/instructor-xl`. Instructor models take an additional `instruction` parameter which includes context for the embeddings use case, similar to prompts before text generation tasks.

!!! note

"intfloat/e5-small-v2" surpassed the quality of instructor-xl, and should be used instead, but we've left this documentation available for existing users

!!!

Instructions can provide a "classification" or "topic" for the text:

#### Classification
Expand Down Expand Up @@ -325,7 +331,7 @@ BEGIN

UPDATE pgml.amazon_us_reviews
SET review_embedding_e5_large = pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'passage: ' || review_body
)
WHERE id BETWEEN i AND i + 10
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ The Switch Kit is an open-source AI SDK that provides a drop in replacement for
const pgml = require("pgml");
const client = pgml.newOpenSourceAI();
const results = client.chat_completions_create(
"HuggingFaceH4/zephyr-7b-beta",
"meta-llama/Meta-Llama-3-8B-Instruct",
[
{
role: "system",
Expand All @@ -65,7 +65,7 @@ console.log(results);
import pgml
client = pgml.OpenSourceAI()
results = client.chat_completions_create(
"HuggingFaceH4/zephyr-7b-beta",
"meta-llama/Meta-Llama-3-8B-Instruct",
[
{
"role": "system",
Expand Down Expand Up @@ -96,7 +96,7 @@ print(results)
],
"created": 1701291672,
"id": "abf042d2-9159-49cb-9fd3-eef16feb246c",
"model": "HuggingFaceH4/zephyr-7b-beta",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"object": "chat.completion",
"system_fingerprint": "eecec9d4-c28b-5a27-f90b-66c3fb6cee46",
"usage": {
Expand All @@ -113,7 +113,7 @@ We don't charge per token, so OpenAI “usage” metrics are not particularly re

!!!

The above is an example using our open-source AI SDK with zephyr-7b-beta, an incredibly popular and highly efficient 7 billion parameter model.
The above is an example using our open-source AI SDK with Meta-Llama-3-8B-Instruct, an incredibly popular and highly efficient 8 billion parameter model.

Notice there is near one to one relation between the parameters and return type of OpenAI’s `chat.completions.create` and our `chat_completion_create`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ vars:
splitter_name: "recursive_character"
splitter_parameters: {"chunk_size": 100, "chunk_overlap": 20}
task: "embedding"
model_name: "intfloat/e5-base"
model_name: "intfloat/e5-small-v2"
query_string: 'Lorem ipsum 3'
limit: 2
```
Expand All @@ -129,7 +129,7 @@ Here's a summary of the key parameters:
* `splitter_name`: Specifies the name of the splitter, set as "recursive\_character".
* `splitter_parameters`: Defines the parameters for the splitter, such as a chunk size of 100 and a chunk overlap of 20.
* `task`: Indicates the task being performed, specified as "embedding".
* `model_name`: Specifies the name of the model to be used, set as "intfloat/e5-base".
* `model_name`: Specifies the name of the model to be used, set as "intfloat/e5-small-v2".
* `query_string`: Provides a query string, set as 'Lorem ipsum 3'.
* `limit`: Specifies a limit of 2, indicating the maximum number of results to be processed.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ We can find a customer that our embeddings model feels is close to the sentiment
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: I love all Star Wars, but Empire Strikes Back is particularly amazing'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -214,7 +214,7 @@ Now we can write our personalized SQL query. It's nearly the same as our query f
-- create a request embedding on the fly
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,7 @@ cp .env.template .env
```bash
OPENAI_API_KEY=<OPENAI_API_KEY>
DATABASE_URL=<POSTGRES_DATABASE_URL starts with postgres://>
MODEL=hkunlp/instructor-xl
MODEL_PARAMS={"instruction": "Represent the document for retrieval: "}
QUERY_PARAMS={"instruction": "Represent the question for retrieving supporting documents: "}
MODEL=intfloat/e5-small-v2
SYSTEM_PROMPT=<> # System prompt used for OpenAI chat completion
BASE_PROMPT=<> # Base prompt used for OpenAI chat completion for each turn
SLACK_BOT_TOKEN=<SLACK_BOT_TOKEN> # Slack bot token to run Slack chat service
Expand Down Expand Up @@ -332,7 +330,7 @@ Once the discord app is running, you can interact with the chatbot on Discord as

### PostgresML vs. Hugging Face + Pinecone

To evaluate query latency, we performed an experiment with 10,000 Wikipedia documents from the SQuAD dataset. Embeddings were generated using the intfloat/e5-large model.
To evaluate query latency, we performed an experiment with 10,000 Wikipedia documents from the SQuAD dataset. Embeddings were generated using the intfloat/e5-small-v2 model.

For PostgresML, we used a GPU-powered serverless database running on NVIDIA A10G GPUs with client in us-west-2 region. For HuggingFace, we used their inference API endpoint running on NVIDIA A10G GPUs in us-east-1 region and a client in the same us-east-1 region. Pinecone was used as the vector search index for HuggingFace embeddings.

Expand Down
4 changes: 2 additions & 2 deletions pgml-cms/blog/speeding-up-vector-recall-5x-with-hnsw.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Let's run that query again:
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -100,7 +100,7 @@ Now let's try the query again utilizing the new HNSW index we created.
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down
4 changes: 2 additions & 2 deletions pgml-cms/blog/the-1.0-sdk-is-here.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ const pipeline = pgml.newPipeline("my_pipeline", {
text: {
splitter: { model: "recursive_character" },
semantic_search: {
model: "intfloat/e5-small",
model: "intfloat/e5-small-v2",
},
},
});
Expand Down Expand Up @@ -90,7 +90,7 @@ pipeline = Pipeline(
"text": {
"splitter": {"model": "recursive_character"},
"semantic_search": {
"model": "intfloat/e5-small",
"model": "intfloat/e5-small-v2",
},
},
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ We'll start with semantic search. Given a user query, e.g. "Best 1980's scifi mo
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -171,7 +171,7 @@ Generating a query plan more quickly and only computing the values once, may mak
There's some good stuff happening in those query results, so let's break it down:

* **It's fast** - We're able to generate a request embedding on the fly with a state-of-the-art model, and search 5M reviews in 152ms, including fetching the results back to the client 😍. You can't even generate an embedding from OpenAI's API in that time, much less search 5M reviews in some other database with it.
* **It's good** - The `review_body` results are very similar to the "Best 1980's scifi movie" request text. We're using the `intfloat/e5-large` open source embedding model, which outperforms OpenAI's `text-embedding-ada-002` in most [quality benchmarks](https://huggingface.co/spaces/mteb/leaderboard).
* **It's good** - The `review_body` results are very similar to the "Best 1980's scifi movie" request text. We're using the `intfloat/e5-small-v2` open source embedding model, which outperforms OpenAI's `text-embedding-ada-002` in most [quality benchmarks](https://huggingface.co/spaces/mteb/leaderboard).
* Qualitatively: the embeddings understand our request for `scifi` being equivalent to `Sci-Fi`, `sci-fi`, `SciFi`, and `sci fi`, as well as `1980's` matching `80s` and `80's` and is close to `seventies` (last place). We didn't have to configure any of this and the most enthusiastic for "best" is at the top, the least enthusiastic is at the bottom, so the model has appropriately captured "sentiment".
* Quantitatively: the `cosine_similarity` of all results are high and tight, 0.90-0.95 on a scale from -1:1. We can be confident we recalled very similar results from our 5M candidates, even though it would take 485 times as long to check all of them directly.
* **It's reliable** - The model is stored in the database, so we don't need to worry about managing a separate service. If you repeat this query over and over, the timings will be extremely consistent, because we don't have to deal with things like random network congestion.
Expand Down Expand Up @@ -254,7 +254,7 @@ Now we can quickly search for movies by what people have said about them:
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -312,7 +312,7 @@ SET ivfflat.probes = 300;
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -401,7 +401,7 @@ SET ivfflat.probes = 1;
```postgresql
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
)
Expand Down Expand Up @@ -457,7 +457,7 @@ SQL is a very expressive language that can handle a lot of complexity. To keep t
-- create a request embedding on the fly
WITH request AS (
SELECT pgml.embed(
'intfloat/e5-large',
'intfloat/e5-small-v2',
'query: Best 1980''s scifi movie'
)::vector(1024) AS embedding
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class EmbedSmallExpression(models.Expression):
self.embedding_field = field

def as_sql(self, compiler, connection, template=None):
return f"pgml.embed('intfloat/e5-small', {self.embedding_field})", None
return f"pgml.embed('intfloat/e5-small-v2', {self.embedding_field})", None
```

And that's it! In just a few lines of code, we're generating and storing high quality embeddings automatically in our database. No additional setup is required, and all the AI complexity is taken care of by PostgresML.
Expand All @@ -70,7 +70,7 @@ Djago Rest Framework provides the bulk of the implementation. We just added a `M
```python
results = TodoItem.objects.annotate(
similarity=RawSQL(
"pgml.embed('intfloat/e5-small', %s)::vector(384) &#x3C;=> embedding",
"pgml.embed('intfloat/e5-small-v2', %s)::vector(384) &#x3C;=> embedding",
[query],
)
).order_by("similarity")
Expand Down Expand Up @@ -115,7 +115,7 @@ In return, you'll get your to-do item alongside the embedding of the `descriptio

The embedding contains 384 floating point numbers; we removed most of them in this blog post to make sure it fits on the page.

You can try creating multiple to-do items for fun and profit. If the description is changed, so will the embedding, demonstrating how the `intfloat/e5-small` model understands the semantic meaning of your text.
You can try creating multiple to-do items for fun and profit. If the description is changed, so will the embedding, demonstrating how the `intfloat/e5-small-v2` model understands the semantic meaning of your text.

### Searching

Expand Down
6 changes: 3 additions & 3 deletions pgml-cms/docs/api/client-sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ const pipeline = pgml.newPipeline("sample_pipeline", {
text: {
splitter: { model: "recursive_character" },
semantic_search: {
model: "intfloat/e5-small",
model: "intfloat/e5-small-v2",
},
},
});
Expand All @@ -98,7 +98,7 @@ pipeline = Pipeline(
"text": {
"splitter": { "model": "recursive_character" },
"semantic_search": {
"model": "intfloat/e5-small",
"model": "intfloat/e5-small-v2",
},
},
},
Expand All @@ -111,7 +111,7 @@ await collection.add_pipeline(pipeline)

The pipeline configuration is a key/value object, where the key is the name of a column in a document, and the value is the action the SDK should perform on that column.

In this example, the documents contain a column called `text` which we are instructing the SDK to chunk the contents of using the recursive character splitter, and to embed those chunks using the Hugging Face `intfloat/e5-small` embeddings model.
In this example, the documents contain a column called `text` which we are instructing the SDK to chunk the contents of using the recursive character splitter, and to embed those chunks using the Hugging Face `intfloat/e5-small-v2` embeddings model.

### Add documents

Expand Down
14 changes: 4 additions & 10 deletions pgml-cms/docs/api/client-sdk/document-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,14 @@ This section will assume we have previously ran the following code:
const pipeline = pgml.newPipeline("test_pipeline", {
abstract: {
semantic_search: {
model: "intfloat/e5-small",
model: "intfloat/e5-small-v2",
},
full_text_search: { configuration: "english" },
},
body: {
splitter: { model: "recursive_character" },
semantic_search: {
model: "hkunlp/instructor-base",
parameters: {
instruction: "Represent the Wikipedia document for retrieval: ",
}
model: "intfloat/e5-small-v2",
},
},
});
Expand All @@ -36,17 +33,14 @@ pipeline = Pipeline(
{
"abstract": {
"semantic_search": {
"model": "intfloat/e5-small",
"model": "intfloat/e5-small-v2",
},
"full_text_search": {"configuration": "english"},
},
"body": {
"splitter": {"model": "recursive_character"},
"semantic_search": {
"model": "hkunlp/instructor-base",
"parameters": {
"instruction": "Represent the Wikipedia document for retrieval: ",
},
"model": "intfloat/e5-small-v2",
},
},
},
Expand Down
Loading
Loading