Hacker News new | ask | show | jobs
by nerdponx 757 days ago
I like to see more focus on the input embeddings.

It's basically the same as feature engineering in pre-deep machine learning: constructing features with high information content can significantly reduce the amount of data and computation needed to fit a useful model. And sometimes it's impossible to fit a useful model without careful feature engineering, either because the model itself is constrained in some way or because there isn't enough data or both.

It's analogous to making a choice of inductive bias within the model itself. We literally could not do LLMs without the carefully-constructed transformer architecture. Why should we expect to make further progress without paying more attention to the embeddings?