Open
Description
We have anecdotal evidence that the Python layer in between libtorch and Postgres calls can increase the cost as much as 4x when doing large batches. In the end state, I think we should be boiling these models down to a pure ONNX format and calling torch directly from Rust completely bypassing HuggingFace Python dependencies during inference, but that'll be a bigger project, and likely require some work for many of the popular models to support each of their idiosyncrasies.