|
|
|
|
|
by ackbar03
1809 days ago
|
|
So is this mainly focused on deployment for applications with high-speed inference requirements? I didn't dive into product in detail. I run my own deep-learning based web-app and inference speed optimization is pretty non-trivial. As far as I know production level speed requirements require use of tensorrt which is definitely not hot-start and requires more than a few minutes to load (i'm not too sure what's going on under the hood, not an expert) but has inference speeds of up to x2 or more, so not quite sure what your targeting or if you've actually managed to solve that problem which would be highly impressive |
|
To me, adding GPUs into the devops mix typically increases the complexity significantly, and I would definitely pay money to someone who can just take my model, host it, and let them deal with the complexities around it.