|
|
|
|
|
by jazzyjackson
31 days ago
|
|
I haven't been following anyone baking models into ASICs, is it not still necessary to pack just as many transistors onto a chip, whether it's an NPU or GPU, ASIC or not you still need to hold hundreds of gigabytes in memory, so how is it cheaper to bake it onto custom silicon than running it on commodity VRAM? (Asking because I don't know!) |
|
https://taalas.com/
Is an example startup in this area claiming 16k tok/s on an asic for llama 8b. Qwen has a 27b model at opus 4.5 quality.