Hacker News new | ask | show | jobs
by KaiserPro 831 days ago
> If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.

as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely.

I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds.

So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits.

So then I busted out to airflow like system with operators spawning 10 processes with 500 threads.

it wasnt very memory efficient, but it moved many thousands of files a second.