Y
Hacker News
new
|
ask
|
show
|
jobs
by
boroboro4
455 days ago
They, in fact, mention inference kv cache as use case in readme. The most advanced kv caching uses hierarchy of gpu ram/regular ram/ssd. Seems like they were able to use their storage abstraction for last tier.