Hacker News new | ask | show | jobs
by zaptrem 942 days ago
We've been building with Modal over the past few months (though no prod-scale tests yet) and were slightly disappointed by very large (10-20 second) cold start times. In the long term we're more interested in inference servers that use compiled/optimized models instead of running plain old PyTorch (which adds another few seconds to cold start on its own).
1 comments

We are adding support for inference servers to Pipeless. We started by the ONNX Runtime, and OpenVINO, CoreML, CUDA and TensorRT execution providers. Some people mentioned me to integrate also with the Triton server, however I still need to deep into that and check its license. The good part is, there is no cold start right now, at the cost of having some resources allocated from the node start.