|
|
|
|
|
by zackangelo
633 days ago
|
|
Yeah, I’ve had to rewrite continuous batching and other scheduling logic. That and multi-GPU inference have been the hardest things to build. I’ll need to get paged attention working as well, but I think I can launch without it. |
|