| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yomismoaqui 121 days ago
	This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed. If you want to run a different model you need new hardware for that new model.

2 comments

sbrother 118 days ago

Do we understand how to scale up the hardware to the point it can run a frontier model? Because this is insane. It will be a game changer for agent systems making 10-100+ calls.

link

sixtyj 121 days ago

It is really a crazy speed. 15k tokens/second.

link

sixtyj 120 days ago

I have tried it again. This is the future of chat UI, imho.

Generated in 0,074s • 15 754 tok/s

link