Hacker News new | ask | show | jobs
by andor 3901 days ago
Basically their Python version ("3 thread pools, 175 threads") is synchronous and single (OS)-threaded, while the Go rewrite uses goroutines and multiple OS threads. The fact that their Python version takes "minutes to startup" indicates that a rewrite was necessary anyways.

Go is a good tool for the job, Python threads are not. asyncio or one of the event-based IO frameworks should work much better.

As for the problem of sharing data between processes (slide 5): it appears that this service is read only? If that's true, what do you need to share? Every process can have it's own connection pool. You don't even need multiprocessing, just use SO_REUSEPORT and start your application multiple times.

1 comments

You could probably get decent performance for a similar application written in another language (then Python) using 175 threads. 175 threads is not that big of deal, the OS can manage it pretty well. It's only when you start talking about thousands of individual connections and thousands of threads that you need to worry. Python sucks at that at low number of threads (GIL).
175 threads use a lot of memory and cause a lot of context switching. I would never write an application so that it needs 175 OS threads, because if it needs that many, how many am I going to need down the road? It's an ominous sign for scalability in my view, even if it works for a while.

[Edit] I'm a assuming a CPU with 8 cores, not some 64 core monster.

175 threads really don't use that much ram. I know userspace stacks are large by default but most apps don't use them and they are never materialized. So even if you're using 1MB of stack space for each one that's only 175MB. You can easily fit that on whatever is the smallest AWS instance.

I imagine that context switching between 175 OS threads all in the same process wouldn't really be that big of a deal.https://www.quora.com/How-does-thread-switching-differ-from-...

Additionally there are many legitimate cases for for a lot of threads like disk IO. If you find your self having to push a lot of bytes to/from a high iops drive like an SSD / NVM drive. Unless you're doing large sequential transfers that you can do in one large call, you will needs submit many concurrent request to saturate the drive (via threads). Disk IO is not network IO.

To be honest, I don't really have a good intuition or hard data on where the context switching overhead (or other limits) starts to bite, because I have always avoided architectures that go into the hundereds or thousands of threads.

Maybe you are right and it's one of those urban myths that we sometimes carry over from times long past based on assumptions that are no longer true.

I would love to have more hard information on that one, because I think that the currently fashionable async/event based way of doing a lot of things makes programs much harder to understand and write.

Good rule of thumb for modern kernel and server class hardware : 100's of (native) threads is ok. 1000's will probably be ok. 10's of 1000 is where you will start to see trouble and 100's of 1000 will most likely cause you to pull out your hair. So the comment above about 175 threads being too many is incorrect.