Skip to content

Commit 73f24dd

Browse files
Lev Kokotovgitbook-bot
authored andcommitted
GITBOOK-83: Importing data with COPY and FDWs
1 parent 3f5b94c commit 73f24dd

File tree

2 files changed

+146
-0
lines changed

2 files changed

+146
-0
lines changed

pgml-cms/docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
* [Select a plan](getting-started/select-a-plan.md)
77
* [Database Credentials](getting-started/database-credentials.md)
88
* [Connect to the Database](getting-started/connect-to-the-database.md)
9+
* [Importing data](getting-started/importing-data.md)
910
* [Machine Learning](machine-learning/README.md)
1011
* [Natural Language Processing](machine-learning/natural-language-processing/README.md)
1112
* [Embeddings](machine-learning/natural-language-processing/embeddings.md)
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Importing data
2+
3+
PostgresML can easily ingest data from your existing data stores. Depending on how frequently your data changes, different methodologies are preferable.
4+
5+
### Static data
6+
7+
Data that changes infrequently can be easily imported into PostgresML using `COPY`. All you have to do is export your data as a CSV file, create a table in Postgres to store it, and import it using the command line.
8+
9+
Let's use a simple CSV file with 3 columns as an example:
10+
11+
| Column | Data type | Example |
12+
| ---------------- | --------- | ------- |
13+
| name | text | John |
14+
| age | integer | 30 |
15+
| is\_paying\_user | boolean | true |
16+
17+
#### Export data as CSV
18+
19+
If you're using a Postgres database already, you can export any table as CSV with just one command:
20+
21+
```bash
22+
psql -c "\copy your_table TO '~/Desktop/your_table.csv' CSV HEADER"
23+
```
24+
25+
If you're using another data store, it should almost always provide a CSV export functionality, since CSV is the most commonly used data format in machine learning.
26+
27+
#### Create table in Postgres
28+
29+
Creating a table in Postgres with the correct schema is as easy as:
30+
31+
```
32+
CREATE TABLE your_table (
33+
name TEXT,
34+
age INTEGER,
35+
is_paying_user BOOLEAN
36+
);
37+
```
38+
39+
#### Import data using the command line
40+
41+
Once you have a table and your data exported as CSV, importing it can also be done with just one command:
42+
43+
```bash
44+
psql -c "\copy your_table FROM '~/Desktop/your_table.csv' CSV HEADER"
45+
```
46+
47+
We took our export command and changed `TO` to `FROM`, and that's it. Make sure you're connecting to your PostgresML database when importing data.
48+
49+
#### Refreshing data
50+
51+
If your data changed, repeat this process again. To avoid duplicate entries in your table, you can truncate (or delete) all rows beforehand:
52+
53+
```
54+
TRUNCATE your_table;
55+
```
56+
57+
### Live data
58+
59+
Importing data from online databases can be done with foreign data wrappers. All PostgresML databases come with both `postgres_fdw` and `dblink` extensions pre-installed, so you can import data from any of your existing Postgres databases, and export machine learning artifacts from PostgresML using just a few lines of SQL.
60+
61+
#### Setting up
62+
63+
Before you get started with foreign data wrappers, log into your current database hosting provider and grab the following connection details:
64+
65+
* Host
66+
* Port (typically `5432`)
67+
* Database name
68+
* Postgres user
69+
* Postgres password
70+
71+
Once you have them, we can setup our live foreign data wrapper connection. All following commands should be executed on your PostgesML database. You don't need to perform any additional steps on your production database.
72+
73+
#### Connecting
74+
75+
To connect to your database from PostgresML, first create a corresponding `SERVER`:
76+
77+
```
78+
CREATE SERVER live_db
79+
FOREIGN DATA WRAPPER postgres_fdw
80+
OPTIONS (
81+
host 'Host'
82+
port 'Port'
83+
dbname 'Database name'
84+
);
85+
```
86+
87+
Replace `Host`, `Port` and `Database name` with details you've collected in the previous step.
88+
89+
Once you have a `SERVER`, let's authenticate to your database:
90+
91+
```
92+
CREATE USER MAPPING
93+
FOR CURRENT_USER
94+
SERVER live_db
95+
OPTIONS (
96+
user 'Postgres user'
97+
password 'Postgres password'
98+
);
99+
```
100+
101+
Replace `Postgres user` and `Postgres password` with details collected in the previous step. If everything went well, we'll be able to validate that everything is working with just one query:
102+
103+
```
104+
SELECT * FROM dblink(
105+
'live_db',
106+
'SELECT 1 AS one'
107+
) AS t1(one INTEGER);
108+
```
109+
110+
You can now execute any query you want on your live database from inside your PostgresML database.
111+
112+
#### Working with your tables
113+
114+
Instead of creating temporary tables for each query, you can import your entire schema into PostgresML using foreign data wrappers:
115+
116+
```
117+
CREATE SCHEMA live_db_tables;
118+
119+
IMPORT FOREIGN SCHEMA public
120+
FROM SERVER live_db
121+
INTO live_db_tables;
122+
```
123+
124+
All your tables from your `public` schema are now available in the `live_db_tables` schema. You can read and write to those tables as if they were hosted in PostgresML. For example, if you have a table called `users`, you could access it with:
125+
126+
```
127+
SELECT * FROM live_db_tables.users LIMIT 1;
128+
```
129+
130+
That's it, your PostgresML database is directly connected to your production database and you can start your machine learning journey.
131+
132+
#### Accelerating bulk access
133+
134+
To speed up access to your data, you can cache it in PostgresML by copying it from a foreign table into a regular table. Taking the example of the `users` table:
135+
136+
```
137+
CREATE TABLE public.users (LIKE live_db_tables.users);
138+
INSERT INTO public.users SELECT * FROM live_db_tables.users;
139+
```
140+
141+
This will copy all rows from your `users` table into PostgresML. You'll be able to access them much quicker if you need to perform a batch job like generating embeddings or training a supervised model.
142+
143+
#### Exporting ML artifacts
144+
145+
If you want to export some artifacts you've created with PostresML to your live database, you can do so with foreign data wrappers as well. Simply copy them using the same mechanism as above, except instead of copying data from the foreign schema, copy data into the foreign schema from the regular table.

0 commit comments

Comments
 (0)