Almost every single data structure access in python, even reading a dictionary item, at some level depends on the GIL for true correctness right now. It is very deeply embedded in cpython.
I think it’s the other way around? I haven’t run into any ML code that could be multithreaded that wasn’t written in C++, but have often run into server tasks that could use a polling thread, etc.
All the ML code is written in lower level languages and that’s very unlikely to change, GIL or no.
Yeah, you're right - even though CUDA is async, doing any preprocessing (in Python) can be harder if you don't have shared memory (the start-up latency hit of multiprocessing is not a problem in this context). I've only ever encountered "embarrassingly parallel" data-feeding problems, where the memory overhead of multiprocessing was small, but I could see other situations. Comment retracted.
The entire debate seems to be for code that strikes me as being far and away from the typical Python programmer.