Hacker News new | ask | show | jobs
by mdale 810 days ago
Makes opportunities for smaller companies to innovative/experiment to offer solutions / acquisition targets where tighter inference compute requirements makes or breaks the experience but larger training cost is less of a concern (such as embedded or local runtime use cases)
1 comments

Before those opportunities are available to you, someone would need to spend a few million dollars and train a competitive model with this, and then release it under a license that allows commercial use. This is out of reach for the vast majority of smaller companies. These models only excel at large parameter counts, even for narrow problems. This is especially true in the case of MoE, which is a way to push the overall parameter count even larger without lighting up the whole thing for every token.