| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jampekka 68 days ago

The HN title is quite a strong claim, but it's nowhere to be seen in the repo.

It seems to be fully prompt based, so the AI still can say anything it pleases.

How well do these complicated prompt systems usually work? My strategy is to stick mostly to just simple prompts with potentially some deterministic tools and vendor harnesses, based on the rationale that these are what the models are trained and evaluated with. And that LLMs still often get tripped up when their context is spammed with too much stuff.

1 comments

sigmoid10 68 days ago

The crazy thing is, you could do this. And it can be done 100% with code using zero prompting - just by limiting the output token set to a structured format and then further constraining parts of that to sources that were retrieved before. I know because I wrote such a system already. It could still match sources and answers incorrectly (just like this approach) but there is no need to rely on crazy prompts and agents to prevent hallucinations or missing outputs (which btw still lack any hard guarantees in the end). Prompting is a good strategy as models become smarter, but when you need reliability, you need to make use of the fact that they are still simple autoregressive completion engines. I don't get why everyone ignores this aspect, since I find it extremely useful all the time.

link

jampekka 68 days ago

> I don't get why everyone ignores this aspect, since I find it extremely useful all the time.

My hunch is because structured/constrained decoding and deterministic subsystems are technically somewhat more involved, requiring e.g. raw API interactions and sometimes manual decoding strategies. Prompt systems can be written in plain text and mostly with "common sense". Not to say writing a good prompt(system) is a trivial task, but it's a different skillset.

link

sigmoid10 66 days ago

Not really. Most big model providers offer structured output decoding in their APIs. But you still have to do some actual programming and design at the end of the day instead of pure vibe-prompting.

link