Hacker News new | ask | show | jobs
by Traster 199 days ago
Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".
1 comments

I can't think of any case where inference doesn't care about latency.
I cant thinl of any reason training isnt going to become real time with a significant cpu budget.