|
|
|
|
|
by nerdponx
757 days ago
|
|
I like to see more focus on the input embeddings. It's basically the same as feature engineering in pre-deep machine learning: constructing features with high information content can significantly reduce the amount of data and computation needed to fit a useful model. And sometimes it's impossible to fit a useful model without careful feature engineering, either because the model itself is constrained in some way or because there isn't enough data or both. It's analogous to making a choice of inductive bias within the model itself. We literally could not do LLMs without the carefully-constructed transformer architecture. Why should we expect to make further progress without paying more attention to the embeddings? |
|