Skip to content

Commit 1b263d4

Browse files
Lev Kokotovgitbook-bot
authored andcommitted
GITBOOK-28: Multicloud
1 parent 1882ca3 commit 1b263d4

4 files changed

+42
-2
lines changed

pgml-cms/blog/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Table of contents
22

33
* [Home](README.md)
4+
* [PostgresML is going multicloud](postgresml-is-going-multicloud.md)
45
* [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md)
56
* [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md)
67
* [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md)

pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ LIMIT 5;
118118

119119
## Generating embeddings from natural language text
120120

121-
PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](/docs/introduction/apis/sql-extensions/pgml.embed) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
121+
PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
122122

123123
Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. [`intfloat/e5-small`](https://huggingface.co/intfloat/e5-small) will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models.
124124

pgml-cms/blog/introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ We have truncated the output to two items
210210

211211
!!!
212212

213-
We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](/docs/introduction/machine-learning/sdks/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models.
213+
We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](https://postgresml.org/docs/introduction/machine-learning/sdks/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models.
214214

215215
PostgresML is free and open source. To run the above examples yourself[ create an account](https://postgresml.org/signup), install pgml, and get running!
216216

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# PostgresML is going multicloud
2+
3+
We started PostgresML two years ago with the goal of making machine learning and AI accessible and easy for everyone. To make this a reality, we needed to deploy PostgresML as closely as possible to our end users. With that goal mind, today we're proud to announce support for a new cloud provider: Azure.
4+
5+
### How we got here
6+
7+
When we first launched PostgresML Cloud, we knew that we needed to deploy our AI application database in many different environments. Since we used AWS at Instacart for over a decade, we started with AWS EC2. However, to ensure that we didn't have much trouble going multicloud in the future, we made some important architectural decisions.
8+
9+
Our operating system of choice, Ubuntu 22.04, is widely available and supported in all major (and small) infrastructure hosting vendors. It's secure, regularly updated and has support for NVIDIA GPUs, CUDA, and latest and most performant hardware we needed to make machine learning performant at scale.
10+
11+
So to get PostgresML working on multiple clouds, we first needed to make it work on Ubuntu.
12+
13+
### apt-get install postgresml
14+
15+
The best part about using a Linux distribution is its package manager. You can install any number of useful packages and tools with just a single command. PostgresML needn't be any different. To make it easy to install PostgresML on Ubuntu, we built a set of .deb packages, containing the PostgreSQL extension, Python dependencies, and configuration files, which we regularly publish to our own Aptitude repository.
16+
17+
Our cloud includes additional packages that install CPU-optimized pgvector, our custom configs, and various utilities we use to configure and monitor the hardware. We install and update those packages with just one command:
18+
19+
```
20+
apt-get update && \
21+
apt-get upgrade
22+
```
23+
24+
Aptitude proved to be a great utility for distributing binaries and configuration files, and we use the same packages and repository as our community to power our Cloud.
25+
26+
### Separating storage and compute
27+
28+
Both Azure and AWS EC2 have the same philosophy when it comes to deploying virtual machines: separate the storage (disks & operating system) from the compute (CPUs, GPUs, memory). This allowed us to transplant our AWS deployment strategy into Azure without any modifications to our deployment strategy.
29+
30+
Instead of creating EBS volumes, we create Azure volumes. Instead of launching EC2 compute instances, we launch Azure VMs. When creating backups, we create EBS snapshots on EC2 and Azure volume snapshots on Azure, all at the cost of single if/else statement:
31+
32+
```rust
33+
match cloud {
34+
Cloud::Aws => launch_ec2_instance().await,
35+
Cloud::Azure => launch_azure_vm().await,
36+
}
37+
```
38+
39+
Azure is our first foray into multicloud, but certainly not our last. Stay tuned for more, and thanks for your continued support of PostgresML.

0 commit comments

Comments
 (0)