flasks builtin web server is for debug/development use. Run your flask apps under gunicorn, twisted web or any of the other supported servers in production.
Yeah, I was running it under gunicorn. But that still leaves you with a small number of concurrent connections. If you want to have long connections, for something like long polling or websockets, then being limited to one request per CPU core seems a little sketchy.
Mind you, it could just be that I take high concurrency for granted. I build most web stuff on Node or Clojure, but now that I think about it, apps that require long quiet connections are actually not the norm.
Are you imagine this scenario or you are actually using Flask for long polling? What's your Flask websocket setup looks like?
FYI Flask internal does not stop you from using thousands of threads of greenlets to process concurrency. And web request-response model is embarrassingly parallel on multi-core. Just spawn one worker per core.
For a simple API service if you can not handle 3K rps per Flask instance you are doing it wrong.
Mind you, it could just be that I take high concurrency for granted. I build most web stuff on Node or Clojure, but now that I think about it, apps that require long quiet connections are actually not the norm.