Hacker News new | ask | show | jobs
by boh 411 days ago
I think all the retail LLM's are working to broaden the available context, but in most practical use-cases it's having the ability to minimize and filter the context that would produce the most value. Even a single PDF with too many similar datapoints leads to confusion in output. They need to switch gears from the high growth, "every thing is possible and available" narrative, to one that narrows the scope. The "hallucination" gap is widening with more context, not shrinking.
7 comments

Definitely my experience. I manage context like a hawk, be it with Claude-as-Google-replacement or LLM integrations into systems. Too little and the results are off. Too much and the results are off.

Not sure what Anthropic and co can do about that, but integrations feel like a step in the wrong direction. Whenever I've tried tool use, it was orders of magnitude more expensive and generally inferior to a simple model call with curated context from SerpApi and such.

Couldn't agree more. I wish all major model makers would build tools into their proprietary UIs to "summarize contents and start a new conversation with that base". My biggest slowdown with working with LLMs while coding is moving my conversation to a new thread because context limit is hit (Claude) or the coherent-thought threshold is exceeded (Gemini).
I never use any web interfaces, just hooked up gptel (an Emacs package) to Claude's API and a few others I regularly use, and I just have a buffer with the entire conversation. I can modify it as needed, spawn a fresh one quickly etc. There's also features to add files and individual snippets, but I usually manage it all in a single buffer. It's a powerful text editor, so efficient text editing is a given.

I bet there are better / less arcane tools, but I think powerful and fast mechanisms for managing context are key and for me, that's really just powerful text editing features.

This is my concern as well. How successful is it in selecting the correct tool out of hundreds or thousands?

Different to what this integration is pushing, the LLMs usage in production based products where high accuracy is a requirement (99%), you have to give a very limited tool set to get any degree of success.

This has been my experience as well. The moment you turn internet access on, Kagi Assistant starts outputting garbage. Turn it off and you're all good.
There's a niche for the kitchen sink approach. It's a type of search engine.

Throw in all context --> ask it what is important for problem XYZ --> curate what it tells you, and feed that to another model to actually solve XYZ

you hit the nail on the head. my experience with prompting LLMs is that providing extra context that isn’t explicitly needed leads to “distracted” outputs
I mean, to be honest, they gotta do both to achieve what they’re aiming for.

A truly useful AI assistant has context on my last 100,000 emails - and also recalls the details of each individual one perfectly, without confusion or hallucination.

Obviously I’m setting a high bar here; I guess what I’m saying is “yes, and”

That's a tough pill to swallow when your company valuation is a $62B based on the premise that you're building a bot capable of transcendent thought, ready to disrupt every vertical in existence.

Tackling individual use-cases is supposed to be something for third party "ecosystem" companies to go after, not the mothership itself.