| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by harrouet 13 days ago

Spot on!

I am also under the impression that the LLM tech is plateauing before bringing the promised productivity. Great as a coding assistant, great a summarizing a text, translating, great a helping plan a trip...

But for the rest, e.g. act as a life assistant, it is still far off with no hope to reach the desired performance level.

I would not be surprisd to see OpenAI and the likes to start reverting to Siri v1 strategies, i.e. "if this then that" kind of agent routing.

1 comments

piokoch 13 days ago

Why this is surprising? LLM-s are good in text generation on the base of the stuff they were trained on. Software is text generation, translation is text generation, LLMs can answer questions since billions were spent on tuning foundation models, that is people were collecting in (semi)automatic way questions with answers to the point we might think that LLM-s are "thinking".

Now people want to handle car rental. What are the relevant data that models were trained on for this kind of application? For Python code there is kirjillion examples on Github, for mathematical proofs there is endless stream of papers, books, etc. But for car rental? Mostly adds in the internet that want to trick you into a bad deal. So yes, LLM will be a disappointment, as it tries, well, to trick you into a bad deal. In addition, data are rather scarce so there will be a lot of hallucination, as it gets mixed up with yacht rental, bikes rental, ski equipment rental, etc.

jorisw 13 days ago

Who said it was surprising?

The performance of specific tasks will depend on either those tasks having been included in the training (which Apple could work on), or added by ways of fine tuning, and context sourced from userland.

For any category of tasks, there's a ton to be gained still in terms of how context is populated more effectively (relevance) and efficiently (token use). See software engineering harnesses and the skills architecture of OpenClaw for example. SWE harnesses make all the difference in how well Claude Code and OpenAI Codex perform. OpenClaw can't do shit without loading skills from the filesystem into context JIT.

I'll be very curious to find out how Apple is feeding context in their new AI approach. Part of it appears to be an 'index' that my iPhone started building (visible in main Settings screen) after installing the iOS 27 Developer Beta.