|
|
|
|
|
by gwern
2493 days ago
|
|
Looking at the whitepaper, I'm a little surprised how little RAM there is for such an enormous chip. Is the overall paradigm here that you still have relatively small minibatches during training, but each minibatch is now vastly faster? |
|