Hacker News new | ask | show | jobs
by DesaiAshu 85 days ago
data bandwidth limits distributed training under current architectures. really interesting implications if we can make progress on that
2 comments

Limits but doesn't prohibit. See https://www.primeintellect.ai/blog/intellect-3 - still useful and can scale enormously. Takes a particular shape and relies heavily on RL, but still big.
What bandwith limits? Im assuming the forward and backward passes have to be done sequentially?
Yes also passing data within each layer