Skip to content

Commit b29606c

Browse files
Moloejoegitbook-bot
authored andcommitted
GITBOOK-98: No subject
1 parent 6fa3bb6 commit b29606c

File tree

87 files changed

+232
-331
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+232
-331
lines changed

pgml-cms/docs/README.md

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,5 @@
11
---
2-
description: An introduction to key the concepts that power PostgresML.
3-
coverY: 0
4-
layout:
5-
cover:
6-
visible: false
7-
size: full
8-
title:
9-
visible: true
10-
description:
11-
visible: true
12-
tableOfContents:
13-
visible: true
14-
outline:
15-
visible: true
16-
pagination:
17-
visible: true
2+
description: The key concepts that make up PostgresML.
183
---
194

205
# Overview

pgml-cms/docs/SUMMARY.md

Lines changed: 92 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,95 @@
11
# Table of contents
22

3+
## Introduction
4+
35
* [Overview](README.md)
4-
* [Getting Started](getting-started/README.md)
5-
* [Create your database](getting-started/create-your-database.md)
6-
* [Connect your app](getting-started/connect-your-app.md)
7-
* [Import your data](getting-started/import-your-data.md)
8-
* [Machine Learning](machine-learning/README.md)
9-
* [Natural Language Processing](machine-learning/natural-language-processing/README.md)
10-
* [Embeddings](machine-learning/natural-language-processing/embeddings.md)
11-
* [Fill Mask](machine-learning/natural-language-processing/fill-mask.md)
12-
* [Question Answering](machine-learning/natural-language-processing/question-answering.md)
13-
* [Summarization](machine-learning/natural-language-processing/summarization.md)
14-
* [Text Classification](machine-learning/natural-language-processing/text-classification.md)
15-
* [Text Generation](machine-learning/natural-language-processing/text-generation.md)
16-
* [Text-to-Text Generation](machine-learning/natural-language-processing/text-to-text-generation.md)
17-
* [Token Classification](machine-learning/natural-language-processing/token-classification.md)
18-
* [Translation](machine-learning/natural-language-processing/translation.md)
19-
* [Zero-shot Classification](machine-learning/natural-language-processing/zero-shot-classification.md)
20-
* [Supervised Learning](machine-learning/supervised-learning/README.md)
21-
* [Data Pre-processing](machine-learning/supervised-learning/data-pre-processing.md)
22-
* [Regression](machine-learning/supervised-learning/regression.md)
23-
* [Classification](machine-learning/supervised-learning/classification.md)
24-
* [Hyperparameter Search](machine-learning/supervised-learning/hyperparameter-search.md)
25-
* [Joint Optimization](machine-learning/supervised-learning/joint-optimization.md)
26-
* [Unsupervised Learning](machine-learning/unsupervised-learning.md)
27-
* [Vector Database](vector-database.md)
28-
* [SDKs](sdks/README.md)
29-
* [Overview](sdks/overview.md)
30-
* [Getting Started](sdks/getting-started.md)
31-
* [OpenSourceAI](sdks/opensourceai.md)
32-
* [Collections](sdks/collections.md)
33-
* [Pipelines](sdks/pipelines.md)
34-
* [Search](sdks/search.md)
35-
* [Tutorials](sdks/tutorials/README.md)
36-
* [Semantic Search](sdks/tutorials/semantic-search.md)
37-
* [Semantic Search using Instructor model](sdks/tutorials/semantic-search-using-instructor-model.md)
38-
* [Extractive Question Answering](sdks/tutorials/extractive-question-answering.md)
39-
* [Summarizing Question Answering](sdks/tutorials/summarizing-question-answering.md)
40-
* [Apps](apps/README.md)
41-
* [Chatbots](apps/chatbots.md)
42-
* [Fraud Detection](apps/fraud-detection.md)
43-
* [Recommendation Engine](apps/recommendation-engine.md)
44-
* [Search](apps/search.md)
45-
* [Time-series Forecasting](apps/time-series-forecasting.md)
46-
* [Use cases](use-cases/README.md)
47-
* [Improve Search Results with Machine Learning](use-cases/improve-search-results-with-machine-learning.md)
48-
* [Generating LLM embeddings with open source models in PostgresML](use-cases/generating-llm-embeddings-with-open-source-models-in-postgresml.md)
49-
* [Tuning vector recall while generating query embeddings in the database](use-cases/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
50-
* [Personalize embedding results with application data in your database](use-cases/personalize-embedding-results-with-application-data-in-your-database.md)
51-
* [LLM based pipelines with PostgresML and dbt (data build tool)](use-cases/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
52-
* [Data Storage & Retrieval](data-storage-and-retrieval/README.md)
53-
* [Tabular data](data-storage-and-retrieval/tabular-data.md)
54-
* [Vectors](data-storage-and-retrieval/vectors.md)
55-
* [Documents](data-storage-and-retrieval/documents.md)
56-
* [Partitioning](data-storage-and-retrieval/partitioning.md)
57-
* [Deploying PostgresML](deploying-postgresml/README.md)
58-
* [PostgresML Cloud](deploying-postgresml/postgresml-cloud/README.md)
59-
* [Plans](deploying-postgresml/postgresml-cloud/plans/README.md)
60-
* [Serverless databases](deploying-postgresml/postgresml-cloud/plans/serverless-databases.md)
61-
* [Dedicated databases](deploying-postgresml/postgresml-cloud/plans/dedicated-databases.md)
62-
* [Self-hosting](deploying-postgresml/self-hosting/README.md)
63-
* [Pooler](deploying-postgresml/self-hosting/pooler.md)
64-
* [Building from source](deploying-postgresml/self-hosting/building-from-source.md)
65-
* [Replication](deploying-postgresml/self-hosting/replication.md)
66-
* [Backups](deploying-postgresml/self-hosting/backups.md)
67-
* [Running on EC2](deploying-postgresml/self-hosting/running-on-ec2.md)
68-
* [PgCat](pgcat/README.md)
69-
* [Features](pgcat/features.md)
70-
* [Installation](pgcat/installation.md)
71-
* [Configuration](pgcat/configuration.md)
72-
* [Benchmarks](benchmarks/README.md)
73-
* [PostgresML is 8-40x faster than Python HTTP microservices](benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
74-
* [Million Requests per Second](benchmarks/million-requests-per-second.md)
75-
* [MindsDB vs PostgresML](benchmarks/mindsdb-vs-postgresml.md)
76-
* [GGML Quantized LLM support for Huggingface Transformers](benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md)
77-
* [Making Postgres 30 Percent Faster in Production](benchmarks/making-postgres-30-percent-faster-in-production.md)
78-
* [Monitoring](monitoring.md)
79-
* [FAQs](faqs.md)
80-
* [Developer Docs](developer-docs/README.md)
81-
* [Local Docker Development](developer-docs/quick-start-with-docker.md)
82-
* [Installation](developer-docs/installation.md)
83-
* [Contributing](developer-docs/contributing.md)
84-
* [Distributed Training](developer-docs/distributed-training.md)
85-
* [GPU Support](developer-docs/gpu-support.md)
6+
* [Getting Started](introduction/getting-started/README.md)
7+
* [Create your database](introduction/getting-started/create-your-database.md)
8+
* [Connect your app](introduction/getting-started/connect-your-app.md)
9+
* [Import your data](introduction/getting-started/import-your-data/README.md)
10+
* [CSV](introduction/getting-started/import-your-data/csv.md)
11+
* [Foreign Data Wrapper](introduction/getting-started/import-your-data/foreign-data-wrapper.md)
12+
* [Machine Learning](introduction/machine-learning/README.md)
13+
* [Natural Language Processing](introduction/machine-learning/natural-language-processing/README.md)
14+
* [Embeddings](introduction/machine-learning/natural-language-processing/embeddings.md)
15+
* [Fill Mask](introduction/machine-learning/natural-language-processing/fill-mask.md)
16+
* [Question Answering](introduction/machine-learning/natural-language-processing/question-answering.md)
17+
* [Summarization](introduction/machine-learning/natural-language-processing/summarization.md)
18+
* [Text Classification](introduction/machine-learning/natural-language-processing/text-classification.md)
19+
* [Text Generation](introduction/machine-learning/natural-language-processing/text-generation.md)
20+
* [Text-to-Text Generation](introduction/machine-learning/natural-language-processing/text-to-text-generation.md)
21+
* [Token Classification](introduction/machine-learning/natural-language-processing/token-classification.md)
22+
* [Translation](introduction/machine-learning/natural-language-processing/translation.md)
23+
* [Zero-shot Classification](introduction/machine-learning/natural-language-processing/zero-shot-classification.md)
24+
* [Supervised Learning](introduction/machine-learning/supervised-learning/README.md)
25+
* [Data Pre-processing](introduction/machine-learning/supervised-learning/data-pre-processing.md)
26+
* [Regression](introduction/machine-learning/supervised-learning/regression.md)
27+
* [Classification](introduction/machine-learning/supervised-learning/classification.md)
28+
* [Hyperparameter Search](introduction/machine-learning/supervised-learning/hyperparameter-search.md)
29+
* [Joint Optimization](introduction/machine-learning/supervised-learning/joint-optimization.md)
30+
* [Unsupervised Learning](introduction/machine-learning/unsupervised-learning.md)
31+
* [SDKs](introduction/machine-learning/sdks/README.md)
32+
* [Overview](introduction/machine-learning/sdks/overview.md)
33+
* [Getting Started](introduction/machine-learning/sdks/getting-started.md)
34+
* [OpenSourceAI](introduction/machine-learning/sdks/opensourceai.md)
35+
* [Collections](introduction/machine-learning/sdks/collections.md)
36+
* [Pipelines](introduction/machine-learning/sdks/pipelines.md)
37+
* [Search](introduction/machine-learning/sdks/search.md)
38+
* [Tutorials](introduction/machine-learning/sdks/tutorials/README.md)
39+
* [Semantic Search](introduction/machine-learning/sdks/tutorials/semantic-search.md)
40+
* [Semantic Search using Instructor model](introduction/machine-learning/sdks/tutorials/semantic-search-using-instructor-model.md)
41+
* [Extractive Question Answering](introduction/machine-learning/sdks/tutorials/extractive-question-answering.md)
42+
* [Summarizing Question Answering](introduction/machine-learning/sdks/tutorials/summarizing-question-answering.md)
43+
44+
## Product
45+
46+
* [Cloud Database](product/cloud-database/README.md)
47+
* [Serverless databases](product/cloud-database/serverless-databases.md)
48+
* [Dedicated](product/cloud-database/dedicated.md)
49+
* [Enterprise](product/cloud-database/plans.md)
50+
* [Vector Database](product/vector-database.md)
51+
* [PgCat Proxy](product/pgcat/README.md)
52+
* [Features](product/pgcat/features.md)
53+
* [Installation](product/pgcat/installation.md)
54+
* [Configuration](product/pgcat/configuration.md)
55+
56+
## Use Cases
57+
58+
* [Chatbots](use-cases/chatbots.md)
59+
* [Search](use-cases/improve-search-results-with-machine-learning.md)
60+
* [Embeddings](use-cases/embeddings/README.md)
61+
* [Generating LLM embeddings with open source models in PostgresML](use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml/README.md)
62+
* [Tuning vector recall while generating query embeddings in the database](use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
63+
* [Personalize embedding results with application data in your database](use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md)
64+
* [Time-series Forecasting](use-cases/time-series-forecasting.md)
65+
* [Fraud Detection](use-cases/fraud-detection.md)
66+
* [Recommendation Engine](use-cases/recommendation-engine.md)
67+
68+
## Resources
69+
70+
* [FAQs](resources/faqs.md)
71+
* [Data Storage & Retrieval](resources/data-storage-and-retrieval/README.md)
72+
* [Tabular data](resources/data-storage-and-retrieval/tabular-data.md)
73+
* [Documents](resources/data-storage-and-retrieval/documents.md)
74+
* [Partitioning](resources/data-storage-and-retrieval/partitioning.md)
75+
* [LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
76+
* [Benchmarks](resources/benchmarks/README.md)
77+
* [PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
78+
* [Million Requests per Second](resources/benchmarks/million-requests-per-second.md)
79+
* [MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md)
80+
* [GGML Quantized LLM support for Huggingface Transformers](resources/benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md)
81+
* [Making Postgres 30 Percent Faster in Production](resources/benchmarks/making-postgres-30-percent-faster-in-production.md)
82+
* [Developer Docs](resources/developer-docs/README.md)
83+
* [Local Docker Development](resources/developer-docs/quick-start-with-docker.md)
84+
* [Installation](resources/developer-docs/installation.md)
85+
* [Contributing](resources/developer-docs/contributing.md)
86+
* [Distributed Training](resources/developer-docs/distributed-training.md)
87+
* [GPU Support](resources/developer-docs/gpu-support.md)
88+
* [Deploying PostgresML](resources/developer-docs/deploying-postgresml/README.md)
89+
* [Monitoring](resources/developer-docs/deploying-postgresml/monitoring.md)
90+
* [Self-hosting](resources/developer-docs/self-hosting/README.md)
91+
* [Pooler](resources/developer-docs/self-hosting/pooler.md)
92+
* [Building from source](resources/developer-docs/self-hosting/building-from-source.md)
93+
* [Replication](resources/developer-docs/self-hosting/replication.md)
94+
* [Backups](resources/developer-docs/self-hosting/backups.md)
95+
* [Running on EC2](resources/developer-docs/self-hosting/running-on-ec2.md)

pgml-cms/docs/apps/README.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

pgml-cms/docs/apps/search.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

pgml-cms/docs/deploying-postgresml/postgresml-cloud/README.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

pgml-cms/docs/getting-started/README.md renamed to pgml-cms/docs/introduction/getting-started/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ A PostgresML deployment consists of multiple components working in concert to pr
1010
* A PgCat pooling proxy to provide secure access and model load balancing across tens of thousands of clients
1111
* A web application to manage deployed models and host SQL notebooks
1212

13-
<figure><img src="../.gitbook/assets/architecture.png" alt=""><figcaption></figcaption></figure>
13+
<figure><img src="../../.gitbook/assets/architecture.png" alt=""><figcaption></figcaption></figure>
1414

1515
By building PostgresML on top of a mature database, we get reliable backups for model inputs and proven scalability without reinventing the wheel, so that we can focus on providing access to the latest developments in open source machine learning and artificial intelligence.
1616

pgml-cms/docs/getting-started/create-your-database.md renamed to pgml-cms/docs/introduction/getting-started/create-your-database.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ description: >-
44
cloud.
55
---
66

7-
# Create a database
7+
# Create your database
88

99
## Sign up for an account
1010

1111
Visit [https://postgresml.org/signup](https://postgresml.org/signup)​ to create a new account with your email, Google or Github authentication.
1212

1313
<div align="center" data-full-width="false">
1414

15-
<figure><img src="../.gitbook/assets/image (6).png" alt="Sign up" width="356"><figcaption></figcaption></figure>
15+
<figure><img src="../../.gitbook/assets/image (6).png" alt="Sign up" width="356"><figcaption></figcaption></figure>
1616

1717
</div>
1818

@@ -25,10 +25,10 @@ Choose the type of GPU powered database deployment that is right for you.
2525

2626
Click on **Get Started** under the plan of your choice.
2727

28-
<figure><img src="../.gitbook/assets/image (7).png" alt=""><figcaption></figcaption></figure>
28+
<figure><img src="../../.gitbook/assets/image (7).png" alt=""><figcaption></figcaption></figure>
2929

3030
### Your database credentials <a href="#create-a-new-account" id="create-a-new-account"></a>
3131

3232
We'll automatically provision an initial set of database credentials and provide you with the connection string. You can connect to your database if you have `psql` installed on your machine, or any other PostgreSQL client.
3333

34-
<figure><img src="../.gitbook/assets/Screenshot from 2023-11-27 23-21-36.png" alt=""><figcaption></figcaption></figure>
34+
<figure><img src="../../.gitbook/assets/Screenshot from 2023-11-27 23-21-36.png" alt=""><figcaption></figcaption></figure>
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Import your data
2+
3+
Machine learning always depends on input data, whether it's generating text with pretrained LLMs, training a retention model on customer data, or predicting session abandonment in real time. Just like any PostgreSQL database, PostgresML can be configured as the authoritative application data store, a streaming replica from some other primary, or use foreign data wrappers to query another data host on demand. Depending on how frequently your data changes and where your authoritative data resides, different methodologies imply different tradeoffs.
4+
5+
PostgresML can easily ingest data from your existing data stores.&#x20;
6+
7+
### Static data
8+
9+
Data that changes infrequently can be easily imported into PostgresML using `COPY`. All you have to do is export your data as a CSV file, create a table in Postgres to store it, and import it using the command line.
10+
11+
{% content-ref url="csv.md" %}
12+
[csv.md](csv.md)
13+
{% endcontent-ref %}
14+
15+
### Live data
16+
17+
Importing data from online databases can be done with foreign data wrappers. Hosted PostgresML databases come with both `postgres_fdw` and `dblink` extensions pre-installed, so you can import data from any of your existing Postgres databases, and export machine learning artifacts from PostgresML using just a few lines of SQL.
18+
19+
{% content-ref url="foreign-data-wrapper.md" %}
20+
[foreign-data-wrapper.md](foreign-data-wrapper.md)
21+
{% endcontent-ref %}
22+
23+
####
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# CSV
2+
3+
### Static data
4+
5+
Data that changes infrequently can be easily imported into PostgresML using `COPY`. All you have to do is export your data as a CSV file, create a table in Postgres to store it, and import it using the command line.
6+
7+
Let's use a simple CSV file with 3 columns as an example:
8+
9+
| Column | Data type | Example |
10+
| ---------------- | --------- | ------- |
11+
| name | text | John |
12+
| age | integer | 30 |
13+
| is\_paying\_user | boolean | true |
14+
15+
#### Export data as CSV
16+
17+
If you're using a Postgres database already, you can export any table as CSV with just one command:
18+
19+
```bash
20+
psql -c "\copy your_table TO '~/Desktop/your_table.csv' CSV HEADER"
21+
```
22+
23+
If you're using another data store, it should almost always provide a CSV export functionality, since CSV is the most commonly used data format in machine learning.
24+
25+
#### Create table in Postgres
26+
27+
Creating a table in Postgres with the correct schema is as easy as:
28+
29+
```
30+
CREATE TABLE your_table (
31+
name TEXT,
32+
age INTEGER,
33+
is_paying_user BOOLEAN
34+
);
35+
```
36+
37+
#### Import data using the command line
38+
39+
Once you have a table and your data exported as CSV, importing it can also be done with just one command:
40+
41+
```bash
42+
psql -c "\copy your_table FROM '~/Desktop/your_table.csv' CSV HEADER"
43+
```
44+
45+
We took our export command and changed `TO` to `FROM`, and that's it. Make sure you're connecting to your PostgresML database when importing data.
46+
47+
#### Refreshing data
48+
49+
If your data changed, repeat this process again. To avoid duplicate entries in your table, you can truncate (or delete) all rows beforehand:
50+
51+
```
52+
TRUNCATE your_table;
53+
```

0 commit comments

Comments
 (0)