Y
Hacker News
new
|
ask
|
show
|
jobs
by
p1esk
2170 days ago
You contradict yourself. Memory pooling is precisely what would allow you to train your bert large on two 2080ti.
1 comments
sabalaba
2169 days ago
No, my comment says that the two 2080 Tis would be better for convnets / situations where you don’t need to train Bert-Large. If you’re sure about memory pooling looking working for DL, please share code and examples, we would love to see one.
link