|
|
|
|
|
by zepolen
3402 days ago
|
|
Didn't really mean Docker was the cause of the memory usage. I mean it might add a little overhead, but afaict the article's memory usage comes from the fact he's using a bunch of heavy python libraries making each process come to about 300mb and running 40 workers. You could get the same performance within 600mb by using 2 processes each running 20 threads. But I guess hardware is cheap. |
|
There is no avoiding the GIL within a single Python process (even with asyncio IIRC, though I've been using JS lately). Multiprocessing is usually the most efficient way to execute I/O intensive, independent parallel operations. Of course you can also run threads within each process.
I do wonder where the 300mb memory is coming from. Surely it can't all be python interpreter? It doesn't look like he's importing 300mb of modules, unless MongoClient really is that big. In that case he could create a separate worker process for persisting data, and only that worker process needs to load the MongoClient module.
One explanation for the memory overhead might be conntrack tables within the network namespace of the container. However I would expect that conntrack table to be on the host, where SNAT is performed. As an aside, the default Docker networking configuration is really not well suited to concurrent network requests, whether inbound or outbound. If you can avoid NAT (and therefore a conntrack table), that is preferable.
This stack could also benefit from tuning some kernel parameters, both within the containers and on the host. Great blog post with details: https://blog.packagecloud.io/eng/2017/02/06/monitoring-tunin...