| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pmxi 339 days ago

There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

2 comments

martinald 339 days ago

You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.

link

sweetjuly 339 days ago

You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.

link

kldg 338 days ago

out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.

link

amelius 339 days ago

Sure, but I was talking about the chat interface, sorry if that was not clear.

link