Skip to content

Added new examples for JavaScript #953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions pgml-sdks/rust/pgml/javascript/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
## Javascript Examples
## Examples

Here we have a set of examples of different use cases of the pgml javascript SDK.
### [Semantic Search](./semantic_search.js)
This is a basic example to perform semantic search on a collection of documents. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived.

## Examples:
### [Question Answering](./question_answering.js)
This is an example to find documents relevant to a question from the collection of documents. The query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result.

1. [Getting Started](./getting-started/) - Simple project that uses the pgml SDK to create a collection, add a pipeline, upsert documents, and run a vector search on the collection.
### [Question Answering using Instructore Model](./question_answering_instructor.js)
In this example, we will use `hknlp/instructor-base` model to build text embeddings instead of the default `intfloat/e5-small` model.

### [Extractive Question Answering](./extractive_question_answering.js)
In this example, we will show how to use `vector_recall` result as a `context` to a HuggingFace question answering model. We will use `Builtins.transform()` to run the model on the database.
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
const pgml = require("pgml");
require("dotenv").config();

pgml.js_init_logger();

const main = async () => {
// Initialize the collection
const collection = pgml.newCollection("my_javascript_eqa_collection_2");

// Add a pipeline
const model = pgml.newModel();
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline(
"my_javascript_eqa_pipeline_1",
model,
splitter,
);
await collection.add_pipeline(pipeline);

// Upsert documents, these documents are automatically split into chunks and embedded by our pipeline
const documents = [
{
id: "Document One",
text: "PostgresML is the best tool for machine learning applications!",
},
{
id: "Document Two",
text: "PostgresML is open source and available to everyone!",
},
];
await collection.upsert_documents(documents);

const query = "What is the best tool for machine learning?";

// Perform vector search
const queryResults = await collection
.query()
.vector_recall(query, pipeline)
.limit(1)
.fetch_all();

// Construct context from results
const context = queryResults
.map((result) => {
return result[1];
})
.join("\n");

// Query for answer
const builtins = pgml.newBuiltins();
const answer = await builtins.transform("question-answering", [
JSON.stringify({ question: query, context: context }),
]);

// Archive the collection
await collection.archive();
return answer;
};

main().then((results) => {
console.log("Question answer: \n", results);
});
12 changes: 0 additions & 12 deletions pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md

This file was deleted.

55 changes: 55 additions & 0 deletions pgml-sdks/rust/pgml/javascript/examples/question_answering.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
const pgml = require("pgml");
require("dotenv").config();

const main = async () => {
// Initialize the collection
const collection = pgml.newCollection("my_javascript_qa_collection");

// Add a pipeline
const model = pgml.newModel();
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline(
"my_javascript_qa_pipeline",
model,
splitter,
);
await collection.add_pipeline(pipeline);

// Upsert documents, these documents are automatically split into chunks and embedded by our pipeline
const documents = [
{
id: "Document One",
text: "PostgresML is the best tool for machine learning applications!",
},
{
id: "Document Two",
text: "PostgresML is open source and available to everyone!",
},
];
await collection.upsert_documents(documents);

// Perform vector search
const queryResults = await collection
.query()
.vector_recall("What is the best tool for machine learning?", pipeline)
.limit(1)
.fetch_all();

// Convert the results to an array of objects
const results = queryResults.map((result) => {
const [similarity, text, metadata] = result;
return {
similarity,
text,
metadata,
};
});

// Archive the collection
await collection.archive();
return results;
};

main().then((results) => {
console.log("Vector search Results: \n", results);
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
const pgml = require("pgml");
require("dotenv").config();

const main = async () => {
// Initialize the collection
const collection = pgml.newCollection("my_javascript_qai_collection");

// Add a pipeline
const model = pgml.newModel("hkunlp/instructor-base", "pgml", {
instruction: "Represent the Wikipedia document for retrieval: ",
});
const splitter = pgml.newSplitter();
const pipeline = pgml.newPipeline(
"my_javascript_qai_pipeline",
model,
splitter,
);
await collection.add_pipeline(pipeline);

// Upsert documents, these documents are automatically split into chunks and embedded by our pipeline
const documents = [
{
id: "Document One",
text: "PostgresML is the best tool for machine learning applications!",
},
{
id: "Document Two",
text: "PostgresML is open source and available to everyone!",
},
];
await collection.upsert_documents(documents);

// Perform vector search
const queryResults = await collection
.query()
.vector_recall("What is the best tool for machine learning?", pipeline, {
instruction:
"Represent the Wikipedia question for retrieving supporting documents: ",
})
.limit(1)
.fetch_all();

// Convert the results to an array of objects
const results = queryResults.map((result) => {
const [similarity, text, metadata] = result;
return {
similarity,
text,
metadata,
};
});

// Archive the collection
await collection.archive();
return results;
};

main().then((results) => {
console.log("Vector search Results: \n", results);
});
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,10 @@ const main = async () => {
// Perform vector search
const queryResults = await collection
.query()
.vector_recall("Some user query that will match document one first", pipeline)
.vector_recall(
"Some user query that will match document one first",
pipeline,
)
.limit(2)
.fetch_all();

Expand All @@ -41,6 +44,7 @@ const main = async () => {
};
});

// Archive the collection
await collection.archive();
return results;
};
Expand Down
2 changes: 1 addition & 1 deletion pgml-sdks/rust/pgml/python/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
## Examples

### [Semantic Search](./semantic_search.py)
This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are are semantically similar documemts to the query. Finally, the collection is archived.
This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived.

### [Question Answering](./question_answering.py)
This is an example to find documents relevant to a question from the collection of documents. It loads the Stanford Question Answering Dataset (SQuAD) into the database, generates chunks and embeddings. Query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result.
Expand Down