Hacker News new | ask | show | jobs
by ackbar03 1809 days ago
So is this mainly focused on deployment for applications with high-speed inference requirements? I didn't dive into product in detail. I run my own deep-learning based web-app and inference speed optimization is pretty non-trivial. As far as I know production level speed requirements require use of tensorrt which is definitely not hot-start and requires more than a few minutes to load (i'm not too sure what's going on under the hood, not an expert) but has inference speeds of up to x2 or more, so not quite sure what your targeting or if you've actually managed to solve that problem which would be highly impressive
2 comments

I suspect the audience is more about the GPU hosting aspect, effectively making GPU-based applications “serverless”.

To me, adding GPUs into the devops mix typically increases the complexity significantly, and I would definitely pay money to someone who can just take my model, host it, and let them deal with the complexities around it.

We don't use TensorRT at the moment, but it is something that we are exploring.