|
|
|
|
|
by danielmarkbruce
851 days ago
|
|
There is a decent (<50%, >20%) chance that frontier foundation models are less oligopoly like than it seems. The reason is that there are so many levers to pull, so much low hanging fruit. For example:
* Read the Bloomberg GPT paper - they create their own tokenizer. For specialized domains (finance, law, medicine, etc) the vocabulary is very different and there is likely a lot to do here, where individual tokens really need to map to specific concepts and having a concept capture in several tokens makes it too hard to learn on limited domain data.
* Data - so many ways to do different data - more/less, cleaner, "better" on some dimension.
* Read the recent papers on different decoding strategies - there seems to be a lot to do here.
* Model architecture (SSM etc). If you speak to people who aren't even researchers, they have 10 ideas around architecture and some of them are decent sounding ideas - lots of low hanging fruit.
* System architecture - ie likely to see more and more "models" served via API which are actually systems of several model calls, and there is a lot to do here.
* Hardware, lower precision etc likely to make training much cheaper It's reasonably likely (again, guessing < 50% > 20%) that this large set of levers to pull become ways to see constant leap-frogging for years and years. Or, at least they become choices/trade-offs rather than strictly "better". |
|
Right now at least people seem to decouple some measures of how smart the model is from knowledge base, and at least for now the really big models seem smartest. So part of the question is well is how insightful / synthesis centric the model needs to be versus effectively doing regressions....