Skip to content

Commit 17f69d3

Browse files
Moloejoegitbook-bot
authored andcommitted
GITBOOK-7: No subject
1 parent 961c1b1 commit 17f69d3

File tree

2 files changed

+21
-17
lines changed

2 files changed

+21
-17
lines changed

pgml-docs/blog/how-we-generate-javascript-and-python-sdks-from-our-canonical-rust-sdk.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ Here is the code augmented to work with [Pyo3](https://github.com/PyO3/pyo3) and
8787

8888
\=== "Pyo3"
8989

90+
{% tabs %}
91+
{% tab title="Pyo3" %}
9092
```rust
9193
use pyo3::prelude::*;
9294

@@ -119,9 +121,9 @@ fn pgml(_py: Python, m: &PyModule) -> PyResult<()> {
119121
Ok(())
120122
}
121123
```
124+
{% endtab %}
122125

123-
\=== "Neon"
124-
126+
{% tab title="Neon" %}
125127
```rust
126128
use neon::prelude::*;
127129

@@ -193,8 +195,8 @@ fn main(mut cx: ModuleContext) -> NeonResult<()> {
193195
Ok(())
194196
}
195197
```
196-
197-
\===
198+
{% endtab %}
199+
{% endtabs %}
198200

199201
## Automatically Converting Vanilla Rust to py03 and Neon compatible Rust
200202

pgml-docs/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ GROUP BY i % 10000;
3737

3838
Spoiler alert: idiomatic Rust is about 10x faster than native SQL, embedded PL/pgSQL, and pure Python. Rust comes close to the hand-optimized assembly version of the Basic Linear Algebra Subroutines (BLAS) implementation. NumPy is supposed to provide optimizations in cases like this, but it's actually the worst performer. Data movement from Postgres to PL/Python is pretty good; it's even faster than the pure SQL equivalent, but adding the extra conversion from Python list to Numpy array takes almost as much time as everything else. Machine Learning systems that move relatively large quantities of data around can become dominated by these extraneous operations, rather than the ML algorithms that actually generate value.
3939

40-
\=== "SQL"
41-
40+
{% tabs %}
41+
{% tab title="SQL" %}
4242
```sql
4343
CREATE OR REPLACE FUNCTION dot_product_sql(a FLOAT4[], b FLOAT4[])
4444
RETURNS FLOAT4
@@ -59,9 +59,9 @@ FROM embeddings, test
5959
ORDER BY 1
6060
LIMIT 1;
6161
```
62+
{% endtab %}
6263

63-
\=== "PL/pgSQL"
64-
64+
{% tab title="PL/pgSQL" %}
6565
```sql
6666
CREATE OR REPLACE FUNCTION dot_product_plpgsql(a FLOAT4[], b FLOAT4[])
6767
RETURNS FLOAT4
@@ -84,9 +84,9 @@ FROM embeddings, test
8484
ORDER BY 1
8585
LIMIT 1;
8686
```
87+
{% endtab %}
8788

88-
\=== "Python"
89-
89+
{% tab title="Python" %}
9090
```sql
9191
CREATE OR REPLACE FUNCTION dot_product_python(a FLOAT4[], b FLOAT4[])
9292
RETURNS FLOAT4
@@ -106,9 +106,9 @@ FROM embeddings, test
106106
ORDER BY 1
107107
LIMIT 1;
108108
```
109+
{% endtab %}
109110

110-
\=== "NumPy"
111-
111+
{% tab title="NumPy" %}
112112
```sql
113113
CREATE OR REPLACE FUNCTION dot_product_numpy(a FLOAT4[], b FLOAT4[])
114114
RETURNS FLOAT4
@@ -129,9 +129,9 @@ FROM embeddings, test
129129
ORDER BY 1
130130
LIMIT 1;
131131
```
132+
{% endtab %}
132133

133-
\=== "Rust"
134-
134+
{% tab title="Rust" %}
135135
```rust
136136
#[pg_extern(immutable, strict, parallel_safe)]
137137
fn dot_product_rust(vector: Vec<f32>, other: Vec<f32>) -> f32 {
@@ -154,8 +154,10 @@ FROM embeddings, test
154154
ORDER BY 1
155155
LIMIT 1;
156156
```
157+
{% endtab %}
158+
159+
{% tab title="BLAS" %}
157160

158-
\=== "BLAS"
159161

160162
```rust
161163
#[pg_extern(immutable, strict, parallel_safe)]
@@ -183,8 +185,8 @@ FROM embeddings, test
183185
ORDER BY 1
184186
LIMIT 1;
185187
```
186-
187-
\===
188+
{% endtab %}
189+
{% endtabs %}
188190

189191
We're building with the Rust [pgrx](https://github.com/tcdi/pgrx/tree/master/pgrx) crate that makes our development cycle even nicer than the one we use to manage Python. It really streamlines creating an extension in Rust, so all we have to worry about is writing our functions. It took about an hour to port all of our vector operations to Rust with BLAS support, and another week to port all the "business logic" for maintaining model training and deployment. We've even gained some new capabilities for caching models across connections (independent processes), now that we have access to Postgres shared memory, without having to worry about Python's GIL and GC. This is the dream of Apache's Arrow project, realized for our applications, without having to change the world, just our implementations. 🤩 Single-copy end-to-end machine learning, with parallel processing and shared data access.
190192

0 commit comments

Comments
 (0)