Hacker News new | ask | show | jobs
by ein0p 808 days ago
Before those opportunities are available to you, someone would need to spend a few million dollars and train a competitive model with this, and then release it under a license that allows commercial use. This is out of reach for the vast majority of smaller companies. These models only excel at large parameter counts, even for narrow problems. This is especially true in the case of MoE, which is a way to push the overall parameter count even larger without lighting up the whole thing for every token.