|
|
|
|
|
by mtthtlt
1696 days ago
|
|
I think using TRTorch[1] can be quick way to generate both easy to use and fast inference models from PyTorch. It compiles your model, using TensorRT, Ahead of Time and enables you to use the compiled model through torch.jit.load("your_trtorch_model.ts") in your application.
Once compiled, you no longer need to keep your model's code in the application (as for usual jit models). The inference time is on par with TensorRT and it does the optimizations for you as well.
You can quantize your model to FP16 or Int8 using PTQ as well and it should give you an additional speed up inference wise. Here is a tutorial[2] to leverage TRTorch. [1] https://github.com/NVIDIA/TRTorch/tree/master/core
[2] https://www.photoroom.com/tech/faster-image-segmentation-trt... |
|
https://imgur.com/OKRbUNw