| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by storus 104 days ago
	Does it support paged attention like vLLM though? Without that they will run into memory fragmentation quickly.

1 comments

Yes, great question!

The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck.

Pretty cool!