Y
Hacker News
new
|
ask
|
show
|
jobs
by
AaronFriel
759 days ago
The PagedAttention paper is a good starting point as it represents the first major open source inference engine that had "pretty good" batch performance for transformers.
https://arxiv.org/pdf/2309.06180