| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by djhn 87 days ago

LLMs will often helpfully predict made up tokens for the content of the data fields.

For 100% of jq use cases I have the data wouldn’t fit into context. But even for the smaller things, I have never, not even once, had an LLM not mangle data that is fed into it.

Take a feed of blog posts (and select the first 50 or so just to give the model a fighting chance). I’ll give you 80% likelihood of the output being invalid JSON. And if you manage to get valid JSON out of it, the actual dates, times and text content will have changed.

1 comments

alwillis 84 days ago

I’ll have to give this a shot.

One possibility: Claude Code subagents get their own 1 million token context window; should be better with large JSON files vs. having everything in the same context window.

link