Hacker News new | ask | show | jobs
by storus 104 days ago
Does it support paged attention like vLLM though? Without that they will run into memory fragmentation quickly.
1 comments

Yes, great question!

The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck.

Pretty cool!