|
|
|
|
|
by icyfox
253 days ago
|
|
All of the products you mention already had research teams (in the case of ChatGPT and Claude that actually predated most of their engineers). So knowing how to build small language models was always in their wheel house. Scaling up to larger LLMs required a few algorithmic advancements but for the most part it was a question of sourcing more data and more compute. The remarkable part of transformers is their scaling laws, which let us achieve much better models without having to reinvent new architecture. |
|