You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-docs/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md
+15-13Lines changed: 15 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -37,8 +37,8 @@ GROUP BY i % 10000;
37
37
38
38
Spoiler alert: idiomatic Rust is about 10x faster than native SQL, embedded PL/pgSQL, and pure Python. Rust comes close to the hand-optimized assembly version of the Basic Linear Algebra Subroutines (BLAS) implementation. NumPy is supposed to provide optimizations in cases like this, but it's actually the worst performer. Data movement from Postgres to PL/Python is pretty good; it's even faster than the pure SQL equivalent, but adding the extra conversion from Python list to Numpy array takes almost as much time as everything else. Machine Learning systems that move relatively large quantities of data around can become dominated by these extraneous operations, rather than the ML algorithms that actually generate value.
39
39
40
-
\=== "SQL"
41
-
40
+
{% tabs %}
41
+
{% tab title="SQL" %}
42
42
```sql
43
43
CREATE OR REPLACEFUNCTIONdot_product_sql(a FLOAT4[], b FLOAT4[])
44
44
RETURNS FLOAT4
@@ -59,9 +59,9 @@ FROM embeddings, test
59
59
ORDER BY1
60
60
LIMIT1;
61
61
```
62
+
{% endtab %}
62
63
63
-
\=== "PL/pgSQL"
64
-
64
+
{% tab title="PL/pgSQL" %}
65
65
```sql
66
66
CREATE OR REPLACEFUNCTIONdot_product_plpgsql(a FLOAT4[], b FLOAT4[])
67
67
RETURNS FLOAT4
@@ -84,9 +84,9 @@ FROM embeddings, test
84
84
ORDER BY1
85
85
LIMIT1;
86
86
```
87
+
{% endtab %}
87
88
88
-
\=== "Python"
89
-
89
+
{% tab title="Python" %}
90
90
```sql
91
91
CREATE OR REPLACEFUNCTIONdot_product_python(a FLOAT4[], b FLOAT4[])
92
92
RETURNS FLOAT4
@@ -106,9 +106,9 @@ FROM embeddings, test
106
106
ORDER BY1
107
107
LIMIT1;
108
108
```
109
+
{% endtab %}
109
110
110
-
\=== "NumPy"
111
-
111
+
{% tab title="NumPy" %}
112
112
```sql
113
113
CREATE OR REPLACEFUNCTIONdot_product_numpy(a FLOAT4[], b FLOAT4[])
We're building with the Rust [pgrx](https://github.com/tcdi/pgrx/tree/master/pgrx) crate that makes our development cycle even nicer than the one we use to manage Python. It really streamlines creating an extension in Rust, so all we have to worry about is writing our functions. It took about an hour to port all of our vector operations to Rust with BLAS support, and another week to port all the "business logic" for maintaining model training and deployment. We've even gained some new capabilities for caching models across connections (independent processes), now that we have access to Postgres shared memory, without having to worry about Python's GIL and GC. This is the dream of Apache's Arrow project, realized for our applications, without having to change the world, just our implementations. 🤩 Single-copy end-to-end machine learning, with parallel processing and shared data access.
0 commit comments