Skip to content

ONNX native runtime support #678

Open
@montanalow

Description

@montanalow

We have anecdotal evidence that the Python layer in between libtorch and Postgres calls can increase the cost as much as 4x when doing large batches. In the end state, I think we should be boiling these models down to a pure ONNX format and calling torch directly from Rust completely bypassing HuggingFace Python dependencies during inference, but that'll be a bigger project, and likely require some work for many of the popular models to support each of their idiosyncrasies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions