Hacker News new | ask | show | jobs
by hansvm 286 days ago
Yeah, we do 100k ML inferences per second. It's not a single server, but the architecture isn't much more complicated than that.

With today's computers, indexing the entire internet and serving 100k QPS also isn't really that demanding architecturally. The vast majority of current implementation complexity exists for reasons other than necessity.