Hacker News new | ask | show | jobs
by p1esk 2170 days ago
You contradict yourself. Memory pooling is precisely what would allow you to train your bert large on two 2080ti.
1 comments

No, my comment says that the two 2080 Tis would be better for convnets / situations where you don’t need to train Bert-Large. If you’re sure about memory pooling looking working for DL, please share code and examples, we would love to see one.