Hacker News new | ask | show | jobs
by robbedpeter 1537 days ago
Smaller models with better performance are beginning to arrive. Things like RETRO, better training data, longer training time, and scale optimization will have these models on phones and desktops doing crazy things in the near future.
1 comments

They are but performance is decreased. In many cases transformers are encoding vast amounts of training data within the insane number of parameters.