Hacker News new | ask | show | jobs
by moonu 64 days ago
Idk if you've seen this already but Taalas does this interesting thing where they embed the model directly onto the chip, this leads to super-fast speeds (https://chatjimmy.ai) but the model they're using is an old small Llama model so the quality is pretty bad. But they say that it can scale, so if that's really true that'd be pretty insane and unlock the inference you're talking about.
2 comments

Robotics/control systems is exactly what came to mind when I saw this release! What struck me is the possibility of look ahead search in real time, a bit like alphazero's mcts.
It's a fascinating proposition and no doubt they'll get bigger models in there, and likely be able to cluster multiple models for mega MOE. One thing that would really be great is if they could take the power requirements down -- the chip requires 2.5KW, which is modest in terms of what the big boys use but would be an issue on a battery powered robot.
"The chip" no, a whole rack/deployment they offer takes 2.5kW. Not just one chip. Squeezing 2.5kW thru 1 chip would be mental.
My bad, thanks for the correction. Skimming FTL.