Hacker News new | ask | show | jobs
by gmaster1440 685 days ago
> Scale beats all else. The best performance improvements come from increasing scale, rather than incremental insights in novel architectures.

...until the next novel architecture is discovered, which won't happen without said AI research.

2 comments

Yep. Where I work there are _a lot_ of efforts underway in that direction. Put simply, standard transformers are great, but they’re very expensive, both to train and to do inference with. They also need enormous datasets. We need architectures that are compute and sample efficient, and friendly to hardware. A standard transformer ticks none of these checkboxes, and research is needed to actually be able to make money with these models. And because profits depend on this research, it’s going to bear fruit. The field is vast and relatively unplowed.
Exactly! Quality can be very important too, as a series of interestingly advanced small models have shown.