|
|
|
|
|
by refibrillator
371 days ago
|
|
The code has few comments but gotta love when you can tell someone was having fun! https://github.com/ScalingIntelligence/tokasaurus/blob/65efb... I’m honestly impressed that a pure python implementation can beat out vLLM and SGLang. Granted they lean on FlashInfer, and of course torch.compile has gotten incredibly powerful in the last few years. Though dynamic shapes have still been a huge thorn in my side, I’ll need to look closer at how they pulled it off… |
|