|
| 1 | +# Importing data |
| 2 | + |
| 3 | +PostgresML can easily ingest data from your existing data stores. Depending on how frequently your data changes, different methodologies are preferable. |
| 4 | + |
| 5 | +### Static data |
| 6 | + |
| 7 | +Data that changes infrequently can be easily imported into PostgresML using `COPY`. All you have to do is export your data as a CSV file, create a table in Postgres to store it, and import it using the command line. |
| 8 | + |
| 9 | +Let's use a simple CSV file with 3 columns as an example: |
| 10 | + |
| 11 | +| Column | Data type | Example | |
| 12 | +| ---------------- | --------- | ------- | |
| 13 | +| name | text | John | |
| 14 | +| age | integer | 30 | |
| 15 | +| is\_paying\_user | boolean | true | |
| 16 | + |
| 17 | +#### Export data as CSV |
| 18 | + |
| 19 | +If you're using a Postgres database already, you can export any table as CSV with just one command: |
| 20 | + |
| 21 | +```bash |
| 22 | +psql -c "\copy your_table TO '~/Desktop/your_table.csv' CSV HEADER" |
| 23 | +``` |
| 24 | + |
| 25 | +If you're using another data store, it should almost always provide a CSV export functionality, since CSV is the most commonly used data format in machine learning. |
| 26 | + |
| 27 | +#### Create table in Postgres |
| 28 | + |
| 29 | +Creating a table in Postgres with the correct schema is as easy as: |
| 30 | + |
| 31 | +``` |
| 32 | +CREATE TABLE your_table ( |
| 33 | + name TEXT, |
| 34 | + age INTEGER, |
| 35 | + is_paying_user BOOLEAN |
| 36 | +); |
| 37 | +``` |
| 38 | + |
| 39 | +#### Import data using the command line |
| 40 | + |
| 41 | +Once you have a table and your data exported as CSV, importing it can also be done with just one command: |
| 42 | + |
| 43 | +```bash |
| 44 | +psql -c "\copy your_table FROM '~/Desktop/your_table.csv' CSV HEADER" |
| 45 | +``` |
| 46 | + |
| 47 | +We took our export command and changed `TO` to `FROM`, and that's it. Make sure you're connecting to your PostgresML database when importing data. |
| 48 | + |
| 49 | +#### Refreshing data |
| 50 | + |
| 51 | +If your data changed, repeat this process again. To avoid duplicate entries in your table, you can truncate (or delete) all rows beforehand: |
| 52 | + |
| 53 | +``` |
| 54 | +TRUNCATE your_table; |
| 55 | +``` |
| 56 | + |
| 57 | +### Live data |
| 58 | + |
| 59 | +Importing data from online databases can be done with foreign data wrappers. All PostgresML databases come with both `postgres_fdw` and `dblink` extensions pre-installed, so you can import data from any of your existing Postgres databases, and export machine learning artifacts from PostgresML using just a few lines of SQL. |
| 60 | + |
| 61 | +#### Setting up |
| 62 | + |
| 63 | +Before you get started with foreign data wrappers, log into your current database hosting provider and grab the following connection details: |
| 64 | + |
| 65 | +* Host |
| 66 | +* Port (typically `5432`) |
| 67 | +* Database name |
| 68 | +* Postgres user |
| 69 | +* Postgres password |
| 70 | + |
| 71 | +Once you have them, we can setup our live foreign data wrapper connection. All following commands should be executed on your PostgesML database. You don't need to perform any additional steps on your production database. |
| 72 | + |
| 73 | +#### Connecting |
| 74 | + |
| 75 | +To connect to your database from PostgresML, first create a corresponding `SERVER`: |
| 76 | + |
| 77 | +``` |
| 78 | +CREATE SERVER live_db |
| 79 | +FOREIGN DATA WRAPPER postgres_fdw |
| 80 | +OPTIONS ( |
| 81 | + host 'Host' |
| 82 | + port 'Port' |
| 83 | + dbname 'Database name' |
| 84 | +); |
| 85 | +``` |
| 86 | + |
| 87 | +Replace `Host`, `Port` and `Database name` with details you've collected in the previous step. |
| 88 | + |
| 89 | +Once you have a `SERVER`, let's authenticate to your database: |
| 90 | + |
| 91 | +``` |
| 92 | +CREATE USER MAPPING |
| 93 | +FOR CURRENT_USER |
| 94 | +SERVER live_db |
| 95 | +OPTIONS ( |
| 96 | + user 'Postgres user' |
| 97 | + password 'Postgres password' |
| 98 | +); |
| 99 | +``` |
| 100 | + |
| 101 | +Replace `Postgres user` and `Postgres password` with details collected in the previous step. If everything went well, we'll be able to validate that everything is working with just one query: |
| 102 | + |
| 103 | +``` |
| 104 | +SELECT * FROM dblink( |
| 105 | + 'live_db', |
| 106 | + 'SELECT 1 AS one' |
| 107 | +) AS t1(one INTEGER); |
| 108 | +``` |
| 109 | + |
| 110 | +You can now execute any query you want on your live database from inside your PostgresML database. |
| 111 | + |
| 112 | +#### Working with your tables |
| 113 | + |
| 114 | +Instead of creating temporary tables for each query, you can import your entire schema into PostgresML using foreign data wrappers: |
| 115 | + |
| 116 | +``` |
| 117 | +CREATE SCHEMA live_db_tables; |
| 118 | +
|
| 119 | +IMPORT FOREIGN SCHEMA public |
| 120 | +FROM SERVER live_db |
| 121 | +INTO live_db_tables; |
| 122 | +``` |
| 123 | + |
| 124 | +All your tables from your `public` schema are now available in the `live_db_tables` schema. You can read and write to those tables as if they were hosted in PostgresML. For example, if you have a table called `users`, you could access it with: |
| 125 | + |
| 126 | +``` |
| 127 | +SELECT * FROM live_db_tables.users LIMIT 1; |
| 128 | +``` |
| 129 | + |
| 130 | +That's it, your PostgresML database is directly connected to your production database and you can start your machine learning journey. |
| 131 | + |
| 132 | +#### Accelerating bulk access |
| 133 | + |
| 134 | +To speed up access to your data, you can cache it in PostgresML by copying it from a foreign table into a regular table. Taking the example of the `users` table: |
| 135 | + |
| 136 | +``` |
| 137 | +CREATE TABLE public.users (LIKE live_db_tables.users); |
| 138 | +INSERT INTO public.users SELECT * FROM live_db_tables.users; |
| 139 | +``` |
| 140 | + |
| 141 | +This will copy all rows from your `users` table into PostgresML. You'll be able to access them much quicker if you need to perform a batch job like generating embeddings or training a supervised model. |
| 142 | + |
| 143 | +#### Exporting ML artifacts |
| 144 | + |
| 145 | +If you want to export some artifacts you've created with PostresML to your live database, you can do so with foreign data wrappers as well. Simply copy them using the same mechanism as above, except instead of copying data from the foreign schema, copy data into the foreign schema from the regular table. |
0 commit comments