Hacker News new | ask | show | jobs
by ubermenchh 164 days ago
yes it does continous batching along with paged attention and prefix caching. i am also goint to be adding some more inference techniques