Y
Hacker News
new
|
ask
|
show
|
jobs
by
riku_iki
2681 days ago
I doubt 1.5B params will fit any single GPU. I think they spread parts of models between GPUs/TPUs similarly to mesh-tensorflow:
https://arxiv.org/abs/1811.02084