Hacker News new | ask | show | jobs
by bpiche 990 days ago
Appears that this is an open source implementation of the same "Efficient Streaming Language Models with Attention Sinks" paper from MIT, linked here 7 days ago. Published on Sept 29, 2023.

https://news.ycombinator.com/item?id=37740932

1 comments

That is exactly correct