Hacker News new | ask | show | jobs
by KoolKat23 1 hour ago
If anyone wishes to see the future. A fast LLM is quite eye-opening. I think chatjimmy uses Talaas' chips where models are hardcoded into the silicon.

https://chatjimmy.ai/

1 comments

Thanks, I didn't know that one! Very impressive speed although quality seems very bad
> although quality seems very bad

The weights they “etched” into the FPGA card that’s used for the ChatJimmy demo are that of a Llama 3-something 8b model.

The actually impressive and novel thing is that Taalas’ve managed to automate that process (clearly – nobody transforms 8 billion numbers into a physical representation by hand).

So now, they can work on scaling this process up, and with low enough lead times (I’ll be convinced they have inside connections to TSMC if they can actually deliver on the promised mere 3-4 months delay), will be able to offer 30-100b+ parameter models under half a year after they’re released, at thousands of tokens per second while probably drawing less wattage (per token, not sure about overall).

Exciting times ahead, folks.