|
|
|
|
|
by marcyb5st
110 days ago
|
|
You can do that with Onnx. You can graft the preprocessing layers to the actual model [1] and then serve that. Honestly, I already thought that ONNX (CPU at least) was already low level code and already very optimized. @Author - if you see this is it possible to add comparisons (ie "vanilla" inference latencies vs timber)? [1] https://gist.github.com/msteiner-google/5f03534b0df58d32abcc... <-- A gist I put together in the past that goes from PyTorch to ONNX and grafts the preprocessing layers to the model, so you can pass the raw input. |
|