| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by barnas2 14 days ago
	A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.

3 comments

Brisk4t 14 days ago

I'm rooting for them HARD but they've been quiet since their last (and only) blog. X and LinkedIn are empty too. I really hope it wasn't a pipe dream.

link

mirekrusin 14 days ago

It starts to be interesting when latency is better than average website.

link

vineyardmike 14 days ago

I’m not sure if this is what you meant, but at 17k t/s, you start to compete with the speed of network calls. You could approach the point of generating an HTML/js/css page faster than some websites can be returned over the network.

link

normalaccess 13 days ago

When that happens, they will be able to rewrite reality in real time.

link

all2 14 days ago

The immediate load (less than 200ms on my machine through a slow connection) is quite pleasant, tbh.

link

tomaytotomato 14 days ago

That's cool, I just tested it out and it is fast but unfortunately its accuracy is not great.

link

selcuka 14 days ago

It's an 8B model. Consider it a proof-of-concept.

link