Hacker News new | ask | show | jobs
by turingfeel 1059 days ago
Interestingly, I did see this tweet [0] mentioning a phase shift that occurs in transformers at exactly the scale RetNet stopped at. Probably simply coincidental but I was previously unaware of this phenomenon at such a scale in transformers.

[0] https://twitter.com/gordic_aleksa/status/1682479676910870529

1 comments

tim dettmers is such a resource, cheers for this