| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by switchbak 6 days ago

Now the next bottleneck is the compiler - which we can model in an LLM! It's only wrong 15% of the time :)

But truly, using Cerebras at ~2k tokens/s, with very low latency is like a vision into the future. You start to rework your workflow around things that can happen without onerous manual review - stating the conditions for success, etc. It's rare that I have a problem that maps well to that, but I expect this is where things are headed.

Of course the fast models tend to not be the SOTA ones, but if that was the case - high quality and near-instant thinking, that's a game changer that I don't think we're really prepared for. The things that get unlocked with higher-than-reasonable speed become very interesting.

1 comments

lhoff 5 days ago

Have you tried https://chatjimmy.ai/ it’s only a demo but it blew my mind. I had the sudden feeling that this is the future.

link

colordrops 5 days ago

What do you mean "demo"? Seems to work... Who is behind this?

link

Silphendio 4 days ago

It's a 3 bit quant of Llama3-8B. I'm sure there are use-cases for that, but it's useless when it comes to tool calls or coding and I wouldn't trust it's factual accuracy either.

link

alfiopuglisi 5 days ago

These guys: https://taalas.com/products/

link