Hacker News new | ask | show | jobs
by iooi 2898 days ago
I'm assuming you're talking about Python, which is also "4-5 lines" to use multithreading or multiprocessing. Can you explain what's wrong with GIL languages?

Now that I think about it, it's even less than 4 lines:

from multiprocess.pool import Pool (or ThreadPool)

pool = Pool()

pool.map(scrape, urls)

1 comments

When the pooled functions are I/O bound then the GIL is not a problem. Any GIL language will do.

However, for example when generating reports, try use the same instrument for serializing 4 pages of DB records to 4 pieces of a big CSV file, each working on a single CPU core. There the languages without GIL truly shine. And languages like Python and Ruby struggle unless their GIL implementations compromise and yield without waiting for an I/O operation to complete.

I'm not sure you understand how the GIL works in Python. If you're using multiprocessing, there's no locking across the code executing on each core. Also, if you're writing to the same file from four processes, you're going to need locking.
What I have last known is that GIL languages work well in multicore scenarios as long as all N tasks have I/O calls that serve as yielding points for the interpreter, and they do not use preemptive scheduling like the BEAM VM (Erlang, Elixir, LFE, Alpaca) do.

Am I mistaken?

As far as Python goes, yes. Multicore implies multiple processes, which means that each process will have it's own Python interpreter, each with it's own GIL.

If you were to use multithreading instead, you would generally have a problem if you were doing non-I/O work.

Then I think we have a misunderstanding of terms. To me "multicore" == "single process, many threads". Apologies for the confusion.

It seems that now we are both on the same page. Single process & many threads are problematic for GIL languages and that's why I gave up using Ruby for scrapers. GIL languages can work very well for the URL downloading part though.