Hacker News new | ask | show | jobs
by unsatchmo 464 days ago
You're correct about the weights: each machine could in fact store all of the weights. However I think you still have to transfer the activations and the KV-Cache while performing inference.