Hacker News new | ask | show | jobs
by iliane5 977 days ago
I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost.