You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/guides/developer-docs/quick-start-with-docker) documentation.
157
+
For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/developer-docs/quick-start-with-docker) documentation.
158
158
159
159
# Getting Started
160
160
@@ -214,7 +214,7 @@ SELECT pgml.transform(
214
214
215
215
Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
Sentiment analysis is a type of natural language processing technique that involves analyzing a piece of text to determine the sentiment or emotion expressed within it. It can be used to classify a text as positive, negative, or neutral, and has a wide range of applications in fields such as marketing, customer service, and political analysis.
@@ -383,7 +383,7 @@ SELECT pgml.transform(
383
383
## Zero-Shot Classification
384
384
Zero Shot Classification is a task where the model predicts a class that it hasn't seen during the training phase. This task leverages a pre-trained language model and is a type of transfer learning. Transfer learning involves using a model that was initially trained for one task in a different application. Zero Shot Classification is especially helpful when there is a scarcity of labeled data available for the specific task at hand.
In the example provided below, we will demonstrate how to classify a given sentence into a class that the model has not encountered before. To achieve this, we make use of `args` in the SQL query, which allows us to provide `candidate_labels`. You can customize these labels to suit the context of your task. We will use `facebook/bart-large-mnli` model.
389
389
@@ -417,7 +417,7 @@ SELECT pgml.transform(
417
417
## Token Classification
418
418
Token classification is a task in natural language understanding, where labels are assigned to certain tokens in a text. Some popular subtasks of token classification include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models can be trained to identify specific entities in a text, such as individuals, places, and dates. PoS tagging, on the other hand, is used to identify the different parts of speech in a text, such as nouns, verbs, and punctuation marks.
Named Entity Recognition (NER) is a task that involves identifying named entities in a text. These entities can include the names of people, locations, or organizations. The task is completed by labeling each token with a class for each named entity and a class named "0" for tokens that don't contain any entities. In this task, the input is text, and the output is the annotated text with named entities.
@@ -467,7 +467,7 @@ select pgml.transform(
467
467
## Translation
468
468
Translation is the task of converting text written in one language into another language.
You have the option to select from over 2000 models available on the Hugging Face <ahref="https://huggingface.co/models?pipeline_tag=translation"target="_blank">hub</a> for translation.
473
473
@@ -490,7 +490,7 @@ select pgml.transform(
490
490
## Summarization
491
491
Summarization involves creating a condensed version of a document that includes the important information while reducing its length. Different models can be used for this task, with some models extracting the most relevant text from the original document, while other models generate completely new text that captures the essence of the original content.
Question Answering models are designed to retrieve the answer to a question from a given text, which can be particularly useful for searching for information within a document. It's worth noting that some question answering models are capable of generating answers even without any contextual information.
Text generation is the task of producing new text, such as filling in incomplete sentences or paraphrasing existing text. It has various use cases, including code generation and story generation. Completion generation models can predict the next word in a text sequence, while text-to-text generation models are trained to learn the mapping between pairs of texts, such as translating between languages. Popular models for text generation include GPT-based models, T5, T0, and BART. These models can be trained to accomplish a wide range of tasks, including text classification, summarization, and translation.
Text-to-text generation methods, such as T5, are neural network architectures designed to perform various natural language processing tasks, including summarization, translation, and question answering. T5 is a transformer-based architecture pre-trained on a large corpus of text data using denoising autoencoding. This pre-training process enables the model to learn general language patterns and relationships between different tasks, which can be fine-tuned for specific downstream tasks. During fine-tuning, the T5 model is trained on a task-specific dataset to learn how to perform the specific task.
Fill-mask refers to a task where certain words in a sentence are hidden or "masked", and the objective is to predict what words should fill in those masked positions. Such models are valuable when we want to gain statistical insights about the language used to train the model.
765
-

765
+

766
766
767
767
```sql
768
768
SELECTpgml.transform(
@@ -859,7 +859,7 @@ SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 5;
859
859
860
860
<!-- ## Sentence Similarity
861
861
Sentence Similarity involves determining the degree of similarity between two texts. To accomplish this, Sentence similarity models convert the input texts into vectors (embeddings) that encapsulate semantic information, and then measure the proximity (or similarity) between the vectors. This task is especially beneficial for tasks such as information retrieval and clustering/grouping.
Copy file name to clipboardExpand all lines: packages/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ The version of PostgresML is set in many places, and all of them need to be upda
53
53
54
54
#### Documentation
55
55
56
-
Additionally, we mention the version of the extension in our documentation. It would be very helpful to update it there as well, so our users are always instructed to install the latest and greatest version. Our documentation is located in `pgml-docs`. If you search it for the current version number, you should find all the places where we mention it.
56
+
Additionally, we mention the version of the extension in our documentation. It would be very helpful to update it there as well, so our users are always instructed to install the latest and greatest version. Our documentation is located in `pgml-cms`. If you search it for the current version number, you should find all the places where we mention it.
0 commit comments