|
|
|
|
|
by valine
1195 days ago
|
|
I think for now, the data requirements to train a SOTA LLM are so extreme we don’t have the luxury of being picky with the training data. We are getting close to the point where there isn’t enough human written text in existence to continue scaling these models. Model refinement seemingly has lower training requirements, putting it within the reach of smaller organizations or wealthy individuals. If you don’t like the refinement dataset it will likely be feasible to bootstrap your own off someone else’s LLM. See what Stanford did with Alpaca. |
|