Hacker News new | ask | show | jobs
by duchenne 1039 days ago
How many tokens per second do you think we can get out of this 6TFlops NPU?
1 comments

For prompt ingestion... I dunno.

Unbatched token generation is basically RAM bandwidth limited, as the entire model has to be cycled through for each token. I bet theoretical performance is similar to the GPU, albeit with much lower power consumption.