| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jncraton 852 days ago

Google released the T5 paper about 5 years ago:

https://arxiv.org/abs/1910.10683

This included full model weights along with a detailed description of the dataset, training process, and ablations that led them to that architecture. T5 was state-of-the-art on many benchmarks when it was released, but it was of course quickly eclipsed by GPT-3.

It was common practice from Google (BERT, T5), Meta (BART), OpenAI (GPT1, GPT2) and others to release full training details and model weights. Following GPT-3, it became much more common for labs to not release full details or model weights.