Hacker News new | ask | show | jobs
by marvinalone 3042 days ago
Real threads.

For better or for worse, Python is the language of deep learning. We're going through all sorts of contortions to make it scale to large datasets, and the biggest problem is that Python is single-threaded for practical purposes.

I know all about the GIL and how difficult it is, but as a user, I don't care about any of that. The moment a similarly usable language comes along that does have working threads, I'll use it. I hope that language is also Python.

5 comments

C++ is the language of deep learning. Python is the scripting language on top of it.
Real threads don't scale outside a single machine easily and it's adding allot of hazards that should be abstracted away. We don't want a dataset interface that makes a developer worry about race conditions. The pythonic way is to use one of those "contortions" because they are actually useful production-grade abstractions that let you scale beyond a single machine. Dask ( https://dask.pydata.org/en/latest/ ) is superior to threads for handling large datasets IMHO.

And if you're wanting something primitive then use cooperative threads like gevent, asyncio, or twisted.

Cython has “with nogil:” not sure if appropriate.
Unfortunately, Julia doesn't have a good multi-threading story as of yet. But `Base.@threads` works pretty in many cases, so perhap looking to that and the Knet.jl package?
why can't you use multiprocessing?
Copy on write for large datasets = bad.