Description
If it's my original database, after executing pgml.dump_all, I would run the following commands to clear the table data:
TRUNCATE TABLE pgml.models cascade;
TRUNCATE TABLE pgml.deployments cascade;
TRUNCATE TABLE pgml.projects cascade;
TRUNCATE TABLE pgml.snapshots cascade;
TRUNCATE TABLE pgml.files cascade;
At this point, executing pgml.load_all would restore the data and training could proceed as normal.
However, when I execute createdb -O postgresml pgml_backup
and then run create extension pgml;
to create the pgml extension on the new database pgml_backup, followed by executing pgml.load_all to restore the data to the pgml_backup database, the data can be restored, but pgml.train
cannot be performed. An error will occur: ERROR: duplicate key value violates unique constraint "projects_pkey".
Here are the replication steps:
In the postgresml
database:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
\q # Exit the database
createdb -O postgresml pgml_backup # Create a new database
Enter the pgml_backup database and create the pgml extension
create extension pgml; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # ERROR: duplicate key value violates unique constraint "projects_pkey"
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm', materialize_snapshot => true); # ERROR: duplicate key value violates unique constraint "models_pkey"
If you are using the same database throughout, such as postgresml:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
TRUNCATE TABLE pgml.models cascade; # OK
TRUNCATE TABLE pgml.deployments cascade; # OK
TRUNCATE TABLE pgml.projects cascade; # OK
TRUNCATE TABLE pgml.snapshots cascade; # OK
TRUNCATE TABLE pgml.files cascade; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # OK
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm'); # OK