Hacker News new | ask | show | jobs
by Okawari 435 days ago
I still prefer tranditional search engines over LLMs but I admit, its results feels worse than it has traditionally.

I don't like LLMs for two reasons:

* I can't really get a feel for the veracity of the information without double checking it. A lot of context I get from just reading results from a traditional search engine is lost when I get an answer from a LLM. I find it somewhat uncomfortable to just accept the answer, and if I have to double check it anyways, the LLM's answer is kind of meaningless and I might as well use a traditional search engine.

* I'm missing out on learning opertunities that I would usually get otherwise by reading or skimming through a larger document trying to find the answer. I appreciate that I skim through a lot of documentation on a regular basis and can recall things that I just happened to read when looking for a solution for another problem. I would hate it if an LLM would drop random tidbits of information when I was looking for concrete answers, but since its a side effect of my information gathering process, I like it.

If I were to use an AI assistant that could help me search and curate the results, instead of trying to answer my question directly. Hopefully in a more sleek way than Perplexity does with its sources feature.

5 comments

One thing I don't like about LLMs is that they vomit out a page of prose as filler around the key point which could just be a short sentence.

At least that has been my experience. I admit I don't use LLMs very much.

It's time to bind "Please be concise in your answer and only mention important details. Use a single paragraph and avoid lists. Keep me in the discussion, I'll ask for details later." to F1.
You've just made me realize that I actually do need that as a macro. Probably type that ten times per day lately. Others might include "in one sentence" or "only answer yes or no, and link sources proving your assertion".
If you’re using ChatGPT, add it to your memory so it always remembers that you prefer that.
No matter how many times I get ChatGPT to write my rules to long-term memory (I checked, and multiple rules exist in LTM multiple times), it inevitably forgets some or all of the rules because after a while, it can only see what's right in front of it, and not (what should be) the defining schema that you might provide.
I haven't used ChatGPT in a while. I used to run into a problem that sounds similar. If you're talking about:

1. Rules that get prefixed in front of your prompt as part of the real prompt ChatGPT gets. Like what they do with the system prompt.

And

2. Some content makes your prompt too big for the context windows where the rules get cut off.

Then, it might help to measure the tokens in the overall prompt, have a max number, and warn if it goes over it. I had a custom, chat app that used their API's with this feature built in.

Another possibility is, when this is detected, it asks you if you want to use one with a larger, context window. Those cost more. So, it would be presented as an option. My app let me select any of their models to do that manually.

Yep. Somehow I made mine noticably grumpier but I dont know which setting or memory piece did the job.

I really like not being complimented on literally everything with a wall of text anymore.

Yeah but it kind of kneecaps the model. They need tokens to "think". It's better to have them create a long response then distill it down later.
You need tokens to create more revenue for the company that is running the LLM. Nothing more, nothing less
Is there a well-known benchmark for this? I don't feel that short vs long answers make any difference, but ofc feelings aren't what we can measure.

Also, if that works, why doesn't copilot/cursor write lots of excessive code mixed with lots of prose only to distill it later?

> don't feel that short vs long answers make any difference

The “thinking” models are really verbose output models that summarise the thinking at the end. These tend to outperform non-thinking models, but at a higher cost.

Anthropic lets you see some/all of the thinking so you can see how the model arrived at the answer.

So if I replace "answer" with "summarize" that should work then?
One problem with LLMs is that the amount of "thinking" they do when answering a question is dependent on how many tokens they use generating the answer. A big part of the power of models like deepseek R1 is they figured out how to get a model to use a lot of tokens in a logical way to work towards solving a problem. The models don't know the answer they come to it by generating it, and generating more helps them. In the future we'll probably see the trend continue where the model generates a "thinking" response first, then the model summarizes the answer concisely.
> I can't really get a feel for the veracity of the information without double checking it.

This is my main reason for not using LLMs as a replacement for search. I want an accurate answer. I quote often search for legal or regulatory issues, health, scientific issues, specific facts about lots of things. i want authoritative sources.

LLMs remind me of the children's game "Telephone."
Am I the only one who double checks all of the information presented to me, from any source?
No you don't. If you were doing that you wouldn't have time to eat, let alone sleep.

You check the information you decide should be verified.

Unless someone's life is on the line, usually eyeballing the source URL is enough for me. If I'm looking for API documentation, there are a few well-known URLs I trust as authoritative. If I'm looking for product information, same thing. If the search engine points me to totallyawesomeproductleadgen19995.biz, I'm probably not getting reliable information.

An LLM response without explicit mention of its provenance... There's no way to even guess whether it is authoritative.

If what you say is literally true: yes, I think you probably are the only one!
Yeah, I need more coffee to decide for myself if double checking all sources is linear or exponential as it progresses to check the checks.
It might even be factorial since you also need to check the checks of the checks!

Actually, it might be fully unbounded even for an n of 1.

Everything reminds me of her… and she’s called Factorio. We’re on a break. She’s not good for me, but oh my do I love her.
Information cannot be destroyed, so for an n of 1 the bounds are that of the universe.
The sources will start to be redundant eventually. It's actually O(1) once you have looked at all the sources... that there are... in the world. Trivial!
I'm not sure. In this context, sources are utterances rather than speakers. So they're only finite if we limit ourselves to a snapshot of past utterances while doing our checking.
Wait, so if you go to python.org and the doc page says, "Added in version 3.11", you double-check this?

What do you even use for double-check? Some random low-quality content farm? A glitchy LLM? An dodgy mirror of official docs full of ads? Or do you actually dig the source code for this?

And do you keep double-checking with all other information on the page... "A TOMLDecodeError will be raised on an invalid TOML document." - are you going to start an interactive session and check which error will be raised?

How deep do you go? Where do you stop?

Just because you can find multiple independent sources saying the same thing doesn't mean it's correct.

You evaluate the credentials and authenticity of the sources you're reading and judge accordingly.
It's done on a case by case basis.

In all honesty doing this for news and such brings me comfort. Because the truth is usually pretty vanilla.

Nothing means anything then.
Are you sure? If you only say it once...

"What I tell you three times is true"

No.

Part of why I prefer to use a search engine is that I can see who is saying it, in what context. It might be Wikipedia, but also CIA world fact book. Or some blog but also python.org.

Or (lately) it might be AI SEO slop, reworded across 10 sites but nothing definitive. Which means I need to change my search strategy.

I find it easier (and quicker) to get to a believable result via a search engine than going via ChatGPT and then having to check what it claims.

>A lot of context I get from just reading results from a traditional search engine is lost when I get an answer from a LLM. I find it somewhat uncomfortable to just accept the answer, and if I have to double check it anyways, the LLM's answer is kind of meaningless and I might as well use a traditional search engine.

And this is how LLMs perform when LLM-rot hasn't even become widely pervasive yet. As time goes on and LLMs regurgitate into themselves, they will become even less trustworthy. I really can't trust what an LLM says, especially when it matters, and the more it lies, the more I can't trust them.

I find LLMs useful for the case where I'm not sure what the right terms are. I can describe something and the LLM gives me a term which I then type into a search engine to get more information. I'm only starting to use LLMs though, so maybe I'll use them more in the future? - only time will tell.