Hacker News new | ask | show | jobs
by SomewhatLikely 1175 days ago
The author does a good job of pointing out what may be the strongest skills of LLMs but the claim they aren't useful as a search engine didn't ring particularly true. For many questions I have ChatGPT is the best tool to use because I know the topics I'm asking about are mentioned hundreds of times in the web, and the LLM can distill down at knowledge to the specifics I'm asking about. If you treat it as a friend who has a ton of esoteric knowledge in many areas but is prone to making stuff up to sound like they know what they're talking about you can still get lots of use pulling facts and some basic reasoning out of the models.
8 comments

He can make the same criticism of Internet searches as he does of GPT: you shouldn't trust them until you validate them.

I find that GPT's answers are for the most part more reliable the searches, specifically today's searches. In the last 12 months, search results have become so spammy with AI generated pages (oh the irony), that it's hard to find reliable answers.

So like search, I look at GPT's answers with a grain of salt and validate them, but these days I use GPT all day every day and search rarely. To be fair, I use it a lot because I have a GPT CLI that works just the way I want it to, since I wrote it :-). https://github.com/drorm/gish

Gish looks really nice. I'm going to give it a try.

It seems like you've been using similar workflows to what I've been trying for coding with gpt?

https://github.com/paul-gauthier/easy-chat#created-by-chatgp...

Pretty much, except that I'm automating everything as much as I can, so that I just give the instructions and GPT does the rest. Here's an example:

-----

#import ~/work/gish/tasks/coding.txt

Change the following so that it looks for the open AI key in the following fashion:

1. env variable

2. os.home()/.openai

3. Throws an exception telling the user to put it in one of the above, and then exits

#diff ~/work/gish/src/LLM.ts

-----

Puts me in vimdiff comparing the old code with the generated code letting me review and cherry pick the changes.

Ya, that's pretty much my workflow as well. Though for my little web app, I could give it the whole ball of html/css/js each time.

I haven't seen anyone else describing this workflow. Feed it the existing code, ask it to modify/improve/fix the code and output a new version of all the input code, review diffs.

It has downsides, because you can easily run out of context window of chatgpt-3.5-turbo. But I am getting much better code out of it versus other approaches I've tried. And it's a very efficient and natural workflow -- we're used to getting and reviewing diffs/PRs from human collaborators.

fabulous
> The cost is based on the assumption that you're using GPT3.5 at $0.02 per 1000 tokens.

It's actually $0.002/1k, FYI

Speaking of AI generated pages, I wonder how OpenAI filter these low quality web pages out of their training set as they continue to training.

Also, I wonder how they decide what code is worth training on. Because a lot of code is written in poor style/has technical debt, it might be the case that these LLMs in the long run lead to an increase in the technical debt in our society. Plus, eventually, and this might already be happening, the LLM are going to end up training on their own outputs, so that could lead to self immolation by the model. I am not certain RLHF completely resolves this issue.

> I wonder how OpenAI filter these low quality web pages out of their training set as they continue to training.

This. The value proposition is very clearly tied to the quality of the training data, and if there's secret sauce for automatically determining information quality that's obviously huge. Google was built in part on such insights. I suspect they do have something. I'd be utterly astonished if quality sorting were an emergent property of LLMs (especially given it's iffy in humans).

The problem, of course, is that if they do have a way of privileging data for training, that information is going to be the center of the usual arms race for attention and thinking. It can't be truly public or it's dead.

yea i'm kind of shocked none of these models implement any kind of fingerprinting, something encoded in zero width spaces or other invisible unicode. It would be trivial to delete it but for the vast majority of cases, it would allow content to be flagged as model output-do not ingest
If they aren't using Bing as a quality filter, they are crazy or stupid.
Google and others would be wise to add a date filter of "before summer 2023". Maybe a bit longer, but not much time left till AI spam really takes over.
Spammers will set article date to 2021
You can't fake a domain registration date
I should invest to used domain market
A lot of people owned domains before 2021.
I imagine ChatGPT will go from the "future changing" to "glorified spam generator" quite quickly.
“I think the right way to think of the models we create is as a reasoning engine, not a fact database. They can also act as a fact database, but that’s not really what is special about them.” —Sam Altman

Rebecca Jarvis interviews Sam Altman for ABC News Rebecca Jarvis, https://www.youtube.com/watch?v=540vzMlf-54

(I don't think this contradicts what you said.)

From his interview with Lex Friedman:

Quoting what he says [0][1]:

> You know, a funny thing about the way we're training these models is I suspect too much of the like processing power for lack of a better word is going into using the models as a database instead of using the model as a reasoning engine. The thing that's really amazing about the system is that it, for some definition of reasoning, and we could of course quibble about it and there's plenty for which definitions this wouldn't be accurate. But for some definition it can do some kind of reasoning. And, you know, maybe like the scholars and the experts and like the armchair quarterbacks on Twitter would say, no, it can't. You're misusing the word, you know, whatever, whatever. But I think most people who have used the system would say, okay, it's doing something in this direction. And I think that's remarkable. And the thing that's most exciting and somehow out of ingesting human knowledge, it's coming up with this reasoning capability. However, we're gonna talk about that. Now, in some senses, I think that will be additive to human wisdom.

[0] https://steno.ai/lex-fridman-podcast-10/367-sam-altman-opena...

[1] https://youtu.be/L_Guz73e6fw?t=828

I'm convinced LLMs are amazing for search, because they're the only thing that managed to give me an answer when I was looking for some software I didn't know the general category name for. I described what I wanted the software to do, corrected its understanding with a few followup messages, and it gave me a list of alternatives that are exactly what I wanted, along with a category name.

Google, in comparison, returned absolutely irrelevant SEO spam.

There’s two different things and we refer to both of them with the word “search”.

Sometimes search means “I can sort of describe what I’m looking for, can you tell me what it’s called?”. LLMs excel here. I told GPT4 I’m doing computer animation and want to do smooth blending, it told me that’s called “interpolation”, I asked for some common terms in the literature about this to help me look and it told me about LERP, SLERP, quaternions, splines, Beziers, keyframes, inverse kinematics, and motion capture. All useful jumping-off points. (A subset of this type of search is “I know what this is called, can you tell me more about it?”. This is probably the place where LLMs sell snake oil the most; they always provide a convincing explanation of the thing, but there’s no guarantee on veracity.)

Other times, search means “I have a specific phrase and I want to find occurrences of it”. LLMs aren’t just bad at this, they are constitutionally incapable of it. The way you build an LLM necessarily involves taking all specific phrases and occurrences thereof, and blending them up into a word slurry that is then condensed and abstracted into floating point weights. It no longer has the specifics to give you. It’s a shame that search engines have let this task (“ctrl-f the web”) fall by the wayside. It’s probably a large part of why people think Google search sucks now, it certainly is for me. (There’s this one essay about the Harappan civilization that I used to be able to find by searching for “strange builders mist of time”, I definitively remember that exact phrase working for me many years ago, and now it does not work and I cannot find that essay anymore.)

It would be nice if it wasn't making up nearly all the links and sources I ask of it.
I tried to hint at that with "Using them as an alternative to a search engine such as Google is one of the most obvious applications—and for a lot of queries this works just fine."

I agree: I do use it as a search engine myself for a bunch of things, but those tend to be things where I've developed a strong intuition that it's likely to give me a reasonable result.

People who haven't developed that intuition yet tend to run into problems - and will often then loudly proclaim that LLMs are evidently useless and shouldn't be trusted for anything.

How useful is it if you have to know what you expect the answer to be? How do you know you are right in your calls of when to trust it? This smells like a confirmation bias machine.
It turns out to be absurdly useful, to a very un-intuitive degree.

One trick I use is to assume it has a "Wikipedia-level" knowledge of pretty much any topic. Often that's what I need! I want to ask some quick questions of someone eloquent who's read the Wikipedia article about something, to save me from having to read the whole thing myself.

If I need more expertise than you can get from reading Wikipedia I know that ChatGPT alone is very unlikely to cut it.

I think it's best used in conjunction in with a search engine. For example, you can ask it to recommend a paper to read and then search for the paper.
The internet allows you to corroborate alleged facts by looking at further sources, following references, etc. Or when you end up not finding anything useful that substantiates what you’re looking for, that can serve as a conclusion as well. You get a feeling for it after a while. ChatGPT by itself however can usually only serve as a starting point. It doesn’t allow you to reach the same level of confidence that a googling session can. I would say they complement each other. But if given the choice of only being allowed to use one of the two, I would opt for the search engine.
This is true about the search engine, and besides things you usually google aren’t guaranteed to be 100% accurate anyways. So how is using ChatGPT any different?

Sure things in Wikipedia or official documents could be accurate, but the internet is still full of misinformation