Hacker News new | ask | show | jobs
by ein0p 685 days ago
Yep. Where I work there are _a lot_ of efforts underway in that direction. Put simply, standard transformers are great, but they’re very expensive, both to train and to do inference with. They also need enormous datasets. We need architectures that are compute and sample efficient, and friendly to hardware. A standard transformer ticks none of these checkboxes, and research is needed to actually be able to make money with these models. And because profits depend on this research, it’s going to bear fruit. The field is vast and relatively unplowed.