| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Kuyawa 121 days ago
	If this is possible, why not all online AI engines work like this?

1 comments

yomismoaqui 121 days ago

This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.

If you want to run a different model you need new hardware for that new model.

link

sbrother 118 days ago

Do we understand how to scale up the hardware to the point it can run a frontier model? Because this is insane. It will be a game changer for agent systems making 10-100+ calls.

link

sixtyj 121 days ago

It is really a crazy speed. 15k tokens/second.

link

sixtyj 120 days ago

I have tried it again. This is the future of chat UI, imho.

Generated in 0,074s • 15 754 tok/s

link