Skip to content

Commit 1d3fde6

Browse files
committed
Ready to ship
0 parents  commit 1d3fde6

File tree

5 files changed

+220
-0
lines changed

5 files changed

+220
-0
lines changed

LICENSE

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Copyright (c) 2024 PostgresML Team
2+
3+
Permission is hereby granted, free of charge, to any person obtaining
4+
a copy of this software and associated documentation files (the
5+
"Software"), to deal in the Software without restriction, including
6+
without limitation the rights to use, copy, modify, merge, publish,
7+
distribute, sublicense, and/or sell copies of the Software, and to
8+
permit persons to whom the Software is furnished to do so, subject to
9+
the following conditions:
10+
11+
The above copyright notice and this permission notice shall be
12+
included in all copies or substantial portions of the Software.
13+
14+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# postgresml-django
2+
3+
postgresml-django is a Python module that integrates PostgresML with Django ORM, enabling automatic in-database embedding of Django models. It simplifies the process of creating and searching vector embeddings for your text data.
4+
5+
## Introduction
6+
7+
This module provides a seamless way to:
8+
- Automatically generate in-databse embeddings for specified fields in your Django models
9+
- Perform vector similarity searches in-database
10+
11+
## Installation
12+
13+
1. Ensure you have [pgml](https://github.com/postgresml/postgresml) installed and configured in your database. The easiest way to do that is to sign up for a free serverless database at [postgresml.org](https://postgresml.org). You can also host it your self.
14+
15+
2. Install the package using pip:
16+
17+
```
18+
pip install postgresml-django
19+
```
20+
21+
You are ready to go!
22+
23+
## Usage Examples
24+
25+
### Example 1: Using intfloat/e5-small-v2
26+
27+
This example demonstrates using the `intfloat/e5-small-v2` transformer, which has an embedding size of 384.
28+
29+
```python
30+
from django.db import models
31+
from postgresml_django import VectorField, Embed
32+
33+
class Document(Embed):
34+
text = models.TextField()
35+
text_embedding = VectorField(
36+
field_to_embed="text",
37+
dimensions=384,
38+
transformer="intfloat/e5-small-v2"
39+
)
40+
41+
# Searching
42+
results = Document.vector_search("text_embedding", "some query to search against")
43+
```
44+
45+
### Example 2: Using mixedbread-ai/mxbai-embed-large-v1
46+
47+
This example shows how to use the `mixedbread-ai/mxbai-embed-large-v1` transformer, which has an embedding size of 512 and requires specific parameters for recall.
48+
49+
```python
50+
from django.db import models
51+
from postgresml_django import VectorField, Embed
52+
53+
class Article(Embed):
54+
content = models.TextField()
55+
content_embedding = VectorField(
56+
field_to_embed="content",
57+
dimensions=512,
58+
transformer="mixedbread-ai/mxbai-embed-large-v1",
59+
transformer_recall_parameters={
60+
"query": "Represent this sentence for searching relevant passages: "
61+
}
62+
)
63+
64+
# Searching
65+
results = Article.vector_search("content_embedding", "search query")
66+
```
67+
68+
Note the differences between the two examples:
69+
1. The `dimensions` parameter is set to 384 for `intfloat/e5-small-v2` and 512 for `mixedbread-ai/mxbai-embed-large-v1`.
70+
2. The `mixedbread-ai/mxbai-embed-large-v1` transformer requires additional parameters for recall, which are specified in the `transformer_recall_parameters` argument.
71+
72+
Both examples will automatically generate embeddings when instances are saved and allow for vector similarity searches using the `vector_search` method.
73+
74+
## Contributing
75+
76+
We welcome contributions to postgresml-django! Whether it's bug reports, feature requests, documentation improvements, or code contributions, your input is valuable to us. Feel free to open issues or submit pull requests on our GitHub repository.

pyproject.toml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[project]
2+
name = "postgresml-django"
3+
requires-python = ">=3.8"
4+
version = "0.1.0"
5+
description = "PostgresML Django integration that enables automatic embedding of specified fields."
6+
authors = [
7+
{name = "PostgresML", email = "team@postgresml.org"},
8+
]
9+
readme = "README.md"
10+
keywords = ["django","machine learning","vector databases","embeddings"]
11+
classifiers = [
12+
"Programming Language :: Python :: 3",
13+
"License :: OSI Approved :: MIT License",
14+
"Operating System :: OS Independent",
15+
]
16+
dependencies = [
17+
"Django",
18+
"pgvector"
19+
]
20+
21+
[project.urls]
22+
Homepage = "https://postgresml.org"
23+
Repository = "https://github.com/postgresml/postgresml-django"
24+
Documentation = "https://github.com/postgresml/postgresml-django"
25+
26+
[build-system]
27+
requires = ["hatchling"]
28+
build-backend = "hatchling.build"

src/postgresml_django/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .main import *

src/postgresml_django/main.py

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
from django.db import models
2+
from django.db.models import Func, Value, F
3+
from django.db.models.functions import Cast
4+
import pgvector.django
5+
import json
6+
7+
8+
class GenerateEmbedding(Func):
9+
function = "pgml.embed"
10+
template = "%(function)s('%(transformer)s', %(expressions)s, '%(parameters)s')"
11+
allowed_default = False
12+
13+
def __init__(self, expression, transformer, parameters={}):
14+
self.transformer = transformer
15+
self.parameters = parameters
16+
super().__init__(expression)
17+
18+
def as_sql(self, compiler, connection, **extra_context):
19+
extra_context["transformer"] = self.transformer
20+
extra_context["parameters"] = json.dumps(self.parameters)
21+
return super().as_sql(compiler, connection, **extra_context)
22+
23+
24+
class Embed(models.Model):
25+
class Meta:
26+
abstract = True
27+
28+
def save(self, *args, **kwargs):
29+
update_fields = kwargs.get("update_fields")
30+
31+
# Check for fields to embed
32+
for field in self._meta.get_fields():
33+
if isinstance(field, VectorField):
34+
if not hasattr(self, field.field_to_embed):
35+
raise AttributeError(
36+
f"Field to embed does not exist: `{field.field_to_embed}`"
37+
)
38+
39+
# Only embed if it's a new instance, full save, or this field is being updated
40+
if not self.pk or update_fields is None or field.name in update_fields:
41+
value_to_embed = getattr(self, field.field_to_embed)
42+
setattr(
43+
self,
44+
field.name,
45+
GenerateEmbedding(
46+
Value(value_to_embed),
47+
field.transformer,
48+
field.transformer_store_parameters,
49+
),
50+
)
51+
52+
super().save(*args, **kwargs)
53+
54+
@classmethod
55+
def vector_search(
56+
cls, field, query_text, distance_function=pgvector.django.CosineDistance
57+
):
58+
# Get the fields
59+
field_instance = getattr(cls._meta.model, field).field
60+
61+
# Generate an embedding for the text
62+
query_embedding = GenerateEmbedding(
63+
Value(query_text),
64+
"intfloat/e5-small-v2",
65+
field_instance.transformer_recall_parameters,
66+
)
67+
68+
# Return the QuerySet
69+
return cls.objects.annotate(
70+
distance=distance_function(
71+
F(field),
72+
Cast(
73+
query_embedding,
74+
output_field=VectorField(dimensions=field_instance.dimensions),
75+
),
76+
)
77+
).order_by("distance")
78+
79+
80+
class VectorField(pgvector.django.VectorField):
81+
def __init__(
82+
self,
83+
field_to_embed=None,
84+
dimensions=None,
85+
transformer=None,
86+
transformer_store_parameters={},
87+
transformer_recall_parameters={},
88+
*args,
89+
**kwargs,
90+
):
91+
self.field_to_embed = field_to_embed
92+
self.transformer = transformer
93+
self.transformer_store_parameters = transformer_store_parameters
94+
self.transformer_recall_parameters = transformer_recall_parameters
95+
super().__init__(dimensions=dimensions, *args, **kwargs)

0 commit comments

Comments
 (0)