Hacker News new | ask | show | jobs
by pmxi 339 days ago
There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

2 comments

You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.
You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.
out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.
Sure, but I was talking about the chat interface, sorry if that was not clear.