Y
Hacker News
new
|
ask
|
show
|
jobs
by
teravor
5 days ago
is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?