|
|
|
|
|
by lumost
31 days ago
|
|
Not my area either! But my understanding is that there are more efficient methods of representing static numbers when you can skip the vram lookup. https://taalas.com/ Is an example startup in this area claiming 16k tok/s on an asic for llama 8b. Qwen has a 27b model at opus 4.5 quality. |
|