Hacker News new | ask | show | jobs
by Tenoke 2318 days ago
Any plans on training other (non nlp) huge models using ZeRO?

Specifically for Transformers - any plans to train a big model with a bigger context window?

Not that this one isn't very impressive, of course.

1 comments

Thanks for your kind words. Yes, we would like to next train a language representation model. And our hunch is that probably something which is a mixture of language representation and language generation would be able to get the best of both worlds.