Hacker News new | ask | show | jobs
by sabalaba 2170 days ago
Memory pooling is irrelevant for DL training. 24 GB is enough to run batch size of 1 for Bert-Large so honestly this is a good choice. Some folks are saying that 2x 2080 Tis would have been better and that's true if you're doing convnets but any large scale language model fine-tuning you'll want to have at least 24 GB of vRAM.
1 comments

You contradict yourself. Memory pooling is precisely what would allow you to train your bert large on two 2080ti.
No, my comment says that the two 2080 Tis would be better for convnets / situations where you don’t need to train Bert-Large. If you’re sure about memory pooling looking working for DL, please share code and examples, we would love to see one.