Hacker News new | ask | show | jobs
by ssijak 80 days ago
For my grug brain can somebody translate this to ELIgrug terms?

Does this mean I would be able to run 500b model on my 48gb macbook without loosing quality?

2 comments

KV cache compression, so how much memory the model needs to use for extending its context. Does not affect the weight size.
I wrote this more intuitive explanation. I think you might find it helpful!

https://prabal.ca/posts/google-long-context-cheaper/