Hacker News new | ask | show | jobs
by whimsicalism 1062 days ago
> however they do not show that this approach produces similar accuracy as large LLMs.

I think they have demonstrated their case pretty well, unless there is some serious degradation of the scaling - 7b is pretty big.

1 comments

Interestingly, I did see this tweet [0] mentioning a phase shift that occurs in transformers at exactly the scale RetNet stopped at. Probably simply coincidental but I was previously unaware of this phenomenon at such a scale in transformers.

[0] https://twitter.com/gordic_aleksa/status/1682479676910870529

tim dettmers is such a resource, cheers for this