Hacker News new | ask | show | jobs
by yomismoaqui 121 days ago
This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.

If you want to run a different model you need new hardware for that new model.

2 comments

Do we understand how to scale up the hardware to the point it can run a frontier model? Because this is insane. It will be a game changer for agent systems making 10-100+ calls.
It is really a crazy speed. 15k tokens/second.
I have tried it again. This is the future of chat UI, imho.

Generated in 0,074s • 15 754 tok/s