Hacker News new | ask | show | jobs
by barnas2 14 days ago
A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.
3 comments

I'm rooting for them HARD but they've been quiet since their last (and only) blog. X and LinkedIn are empty too. I really hope it wasn't a pipe dream.
It starts to be interesting when latency is better than average website.
I’m not sure if this is what you meant, but at 17k t/s, you start to compete with the speed of network calls. You could approach the point of generating an HTML/js/css page faster than some websites can be returned over the network.
When that happens, they will be able to rewrite reality in real time.
The immediate load (less than 200ms on my machine through a slow connection) is quite pleasant, tbh.
That's cool, I just tested it out and it is fast but unfortunately its accuracy is not great.
It's an 8B model. Consider it a proof-of-concept.