Skip to content

After using pgml.dump_all and pgml.load_all for data backup and migration, an error occurs when trying to train on the new database. #1307

Open
@HJH0924

Description

@HJH0924

If it's my original database, after executing pgml.dump_all, I would run the following commands to clear the table data:
TRUNCATE TABLE pgml.models cascade;
TRUNCATE TABLE pgml.deployments cascade;
TRUNCATE TABLE pgml.projects cascade;
TRUNCATE TABLE pgml.snapshots cascade;
TRUNCATE TABLE pgml.files cascade;

At this point, executing pgml.load_all would restore the data and training could proceed as normal.

However, when I execute createdb -O postgresml pgml_backup and then run create extension pgml; to create the pgml extension on the new database pgml_backup, followed by executing pgml.load_all to restore the data to the pgml_backup database, the data can be restored, but pgml.train cannot be performed. An error will occur: ERROR: duplicate key value violates unique constraint "projects_pkey".

Here are the replication steps:
In the postgresml database:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
\q # Exit the database
createdb -O postgresml pgml_backup # Create a new database

Enter the pgml_backup database and create the pgml extension
create extension pgml; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # ERROR: duplicate key value violates unique constraint "projects_pkey"
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm', materialize_snapshot => true); # ERROR: duplicate key value violates unique constraint "models_pkey"

If you are using the same database throughout, such as postgresml:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
TRUNCATE TABLE pgml.models cascade; # OK
TRUNCATE TABLE pgml.deployments cascade; # OK
TRUNCATE TABLE pgml.projects cascade; # OK
TRUNCATE TABLE pgml.snapshots cascade; # OK
TRUNCATE TABLE pgml.files cascade; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # OK
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm'); # OK

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions