Hacker News new | ask | show | jobs
by teravor 5 days ago
is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?