Hacker News new | ask | show | jobs
by chatmasta 3402 days ago
2 processes = 2 GIL

There is no avoiding the GIL within a single Python process (even with asyncio IIRC, though I've been using JS lately). Multiprocessing is usually the most efficient way to execute I/O intensive, independent parallel operations. Of course you can also run threads within each process.

I do wonder where the 300mb memory is coming from. Surely it can't all be python interpreter? It doesn't look like he's importing 300mb of modules, unless MongoClient really is that big. In that case he could create a separate worker process for persisting data, and only that worker process needs to load the MongoClient module.

One explanation for the memory overhead might be conntrack tables within the network namespace of the container. However I would expect that conntrack table to be on the host, where SNAT is performed. As an aside, the default Docker networking configuration is really not well suited to concurrent network requests, whether inbound or outbound. If you can avoid NAT (and therefore a conntrack table), that is preferable.

This stack could also benefit from tuning some kernel parameters, both within the containers and on the host. Great blog post with details: https://blog.packagecloud.io/eng/2017/02/06/monitoring-tunin...

2 comments

Thanks for your comment. Great article for networking. I'm not so familiar with NAT of linux, so could you post more details about it with python or docker, performance/advantage or something else?

Thanks again.

> 2 processes = 2 GIL

Whats your point? 20 threads will still run per GIL, and assuming a dual core cpu, 2 processes x 20 threads each will still run 40 workers.