| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sabalaba 2170 days ago
	Memory pooling is irrelevant for DL training. 24 GB is enough to run batch size of 1 for Bert-Large so honestly this is a good choice. Some folks are saying that 2x 2080 Tis would have been better and that's true if you're doing convnets but any large scale language model fine-tuning you'll want to have at least 24 GB of vRAM.

1 comments

p1esk 2170 days ago

You contradict yourself. Memory pooling is precisely what would allow you to train your bert large on two 2080ti.

link

sabalaba 2168 days ago

No, my comment says that the two 2080 Tis would be better for convnets / situations where you don’t need to train Bert-Large. If you’re sure about memory pooling looking working for DL, please share code and examples, we would love to see one.

link