Skip to content

Text datasets should support numeric targets for classification tasks #660

Open
@montanalow

Description

@montanalow

The example for fine tuning text classification models no longer works after the Rust port, because mixed type vectors are not implemented. https://github.com/postgresml/postgresml/blob/master/pgml-extension/src/orm/snapshot.rs#L824

https://postgresml.org/docs/guides/transformers/fine_tuning

pgml_development.public> SELECT pgml.tune(
                             'IMDB Review Sentiment',
                             task => 'text-classification',
                             relation_name => 'pgml.imdb',
                             y_column_name => 'label',
                             model_name => 'distilbert-base-uncased',
                             hyperparams => '{
                                 "learning_rate": 2e-5,
                                 "per_device_train_batch_size": 16,
                                 "per_device_eval_batch_size": 16,
                                 "num_train_epochs": 1,
                                 "weight_decay": 0.01
                             }',
                             test_size => 0.5,
                             test_sampling => 'last'
                         )
[2023-05-30 09:20:30] [XX000] ERROR: only text type columns are supported

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmlrustPull requests that update Rust code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions