I've written an open-source driver for the decoding side of the nvjpg module found in the Tegra X1 (ie. earlier hardware revision than the one in the A100).
I did some quick benchmarks against libjpeg-turbo, if that can give you an idea. I expect encoding performance would be similar.
Probably quite a bit, I don't know. The typical use case is to load up thousands of JPEGs at once to get good throughput despite copy overhead. You can see here the benchmark against jpeg-turbo: https://developer.nvidia.com/blog/leveraging-hardware-jpeg-d...
I did some quick benchmarks against libjpeg-turbo, if that can give you an idea. I expect encoding performance would be similar.
https://github.com/averne/oss-nvjpg#performance