|
|
|
|
|
by KaiserPro
831 days ago
|
|
> If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread. as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely. I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds. So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits. So then I busted out to airflow like system with operators spawning 10 processes with 500 threads. it wasnt very memory efficient, but it moved many thousands of files a second. |
|