|
|
|
|
|
by segmondy
784 days ago
|
|
I do argue that the IT is the architecture. We have pretty much had all the data that these LLMs were trained on for a long time. The game changer was the architecture not the data. Unless of course you are on the code is data camp ;). |
|
It sounds pretty obvious to say that the difference is whatever is different, but isn't that literally what both sides of this argument are saying?
edit: I do think that what the original linked essay is saying is slightly subtler than that, which is that _given_ that everyone is using the same transformer architecture, the exact hyperparameters and fine tuning that is done matters a lot less than the data set does.