ONNX native runtime support

We have anecdotal evidence that the Python layer in between libtorch and Postgres calls can increase the cost as much as 4x when doing large batches. In the end state, I think we should be boiling these models down to a pure ONNX format and calling torch directly from Rust completely bypassing HuggingFace Python dependencies during inference, but that'll be a bigger project, and likely require some work for many of the popular models to support each of their idiosyncrasies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX native runtime support #678

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX native runtime support #678

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions