Y
Hacker News
new
|
ask
|
show
|
jobs
by
iliane5
977 days ago
I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost.