Y
Hacker News
new
|
ask
|
show
|
jobs
Why do output tokens cost 5x more than input tokens?
(
anirudhsathiya.com
)
3 points
by
ani17
55 days ago
2 comments
ani17
55 days ago
Author here. I wanted to understand what vLLM and llama.cpp are actually doing under the hood, but the codebases are massive. So I wrote a stripped down version from scratch to see the core ideas without the production complexity.
Code:
https://github.com/Anirudh171202/WhiteLotus
link
lazyMonkey69
55 days ago
I think the paged attention part is a bit oversimplified. Nice read otherwise!
link
Code: https://github.com/Anirudh171202/WhiteLotus