Hacker News new | ask | show | jobs
by lukebechtel 106 days ago
Yes, great question!

The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck.

Pretty cool!