| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by inoop 627 days ago

As always, it depends a lot on what you're doing, and a lot of people are using Python for AI.

One of the drawbacks of multi-processing versus multi-threading is that you cannot share memory (easily, cheaply) between processes. During model training, and even during inference, this becomes a problem.

For example, imagine a high volume, low latency, synchronous computer vision inference service. If you're handling each request in a different process, then you're going to have to jump through a bunch of hoops to make this performant. For example, you'll need to use shared memory to move data around, because images are large, and sockets are slow. Another issue is that each process will need a different copy of the model in GPU memory, which is a problem in a world where GPU memory is at a premium. You could of course have a single process for the GPU processing part of your model, and then automatically batch inputs into this process, etc. etc. (and people do) but all this is just to work around the lack of proper threading support in Python.

By the way, if anyone is struggling with these challenges today, I recommend taking a peek at nvidia's Triton inference server (https://github.com/triton-inference-server/server), which handles a lot of these details for you. It supports things like zero-copy sharing of tensors between parts of your model running in different processes/threads and does auto-batching between requests as well. Especially auto-batching gave us big throughput increase with a minor latency penalty!

2 comments

jgraettinger1 627 days ago

> For example, imagine a high volume, low latency, synchronous computer vision inference service.

I'm not in this space and this is probably too simplistic, but I would think pairing asyncio to do all IO (reading / decoding requests and preparing them for inference) coupled with asyncio.to_thread'd calls to do_inference_in_C_with_the_GIL_released(my_prepared_request), would get you nearly all of the performance benefit using current Python.

link

saagarjha 627 days ago

Machine learning people not call their thing Triton challenge (IMPOSSIBLE)

link

buildbot 627 days ago

This (Nvidia’s) triton predates openAI’s by a few years.

link