| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by karterk 869 days ago
	It's interesting how all focus is now primarily on decoder-only next-token-prediction models. Encoders (BERT, encoder of T5) are still useful for generating embedding for tasks like retrieval or classification. While there is a lot of work on fine-tuning BERT and T5 for such tasks, it would be nice to see more research on better pre-training architectures for embedding use cases.

1 comments

jeremycochoy 868 days ago

I believe RWKV is actually an architecture that can be used for encoding: given a LSTM/GRU, you can simply take the last state as an encoding of your sequence. The same should be possible with RWKV, right?

link