| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by COAGULOPATH 687 days ago

Yes, this only helps multi-step reasoning. The model still has problems with general knowledge and deep facts.

There's no way you can "reason" a correct answer to "list the tracklisting of some obscure 1991 demo by a band not on Wikipedia." You either know or you don't.

I usually test new models with questions like "what are the levels in [semi-famous PC game from the 90s]?" The release version of GPT-4 could get about 75% correct. o1-preview gets about half correct. o1-mini gets 0% correct.

Fair enough. The GPT-4 line aren't meant to be search engines or encyclopedias. This is still a useful update though.

2 comments

barrkel 687 days ago

o1-mini is a small model (knows a lot less about the world) and is tuned for reasoning through symbolic problems (maths, programming, chemistry etc.).

You're using a calculator as a search engine.

link

mattmanser 687 days ago

It's actually much worse than that and you're inadvertently down playing how bad it is.

It doesn't even know mildly obsecure facts that are on the internet.

For example last night I was trying to do something with C# generics and it confidently told me I could use pattern matching on the type in a switch statwmnt, and threw out some convincing looking code.

You can't, it's impossible. It wàa completely wrong. When I told that this, it told me I was right, and proceeded to give me code that was even more wrong.

This is an obscure, but well documented, part of the spec.

So it's not about facts that aren't on the internet, it's just bad at facts fullstop.

What it's good at is facts the internet agrees on. Unless the internet is wrong. Which is not always a good thing with the way the language it uses to speak is so confident.

If you want to fuck with AI models as a bunch of code questions on Reddit, GitHub and SO with example code saying 'can I do X'. The answer is no, but chatgpt/codepilot/etc. will start spewing out that nonsense as if it's fact.

As for non-proframming, we're about to see the birth of a new SEO movement of tricking AI models to believe your 'facts'.

link

b112 687 days ago

I wonder though, is the documentation only referenced a few places on the Internet, and are there also many forums with people pasting "Why isn't this working?" problems?

If there are a lot of people pasting broken code, now the LLM has all these examples of broken code, which it doesn't know are that, and only a couple of references to documentation. Worse, a well trained LLM may realise that specs change, and that even documentation may not be considered 100% accurate (for it is older, out of date).

After all, how many times have you had something updated, an API, a language, a piece of software, but the docs weren't updates? Happens all the time, sadly.

So it may believe newer examples of code, such as the aforementioned pasted code, might be more correct than the docs.

Also, if people keep trying to solve the same issue again, and keep pasting those examples again, well...

I guess my point here is, hallucinations come from multi-faceted issues, one being "wrong examples are more plentiful than correct". Or even "there's just a lot of wrong examples".

link

koe123 687 days ago

Its not always the right tool depending on the task. IMO using LLMs is also a skill, much like learning how to Google stuff.

E.g. apparently C# generics isn’t something its good at. Interesting, so don’t use it for that, apparently its the wrong tool. In contrast, its amazing at C++ generics, and thus speeds up my productivity. So do use it for that!

link

neonsunset 687 days ago

> For example last night I was trying to do something with C# generics and it confidently told me I could use pattern matching on the type in a switch statwmnt, and threw out some convincing looking code.

Just use it on an instance instead

  var res = thing switch {
    OtherThing ot => …,
    int num => …,
    string s => …,
    _ => …
  };

link

pcdoodle 686 days ago

>>>As for non-proframming, we're about to see the birth of a new SEO movement of tricking AI models to believe your 'facts'.

This is kinda crazy to think about.

link

simonw 686 days ago

If you ask Google Gemini right now for the name of the whale in half moon bay harbor it will tell you it’s called Teresa T.

That was thanks to my experiment in influencing AI search: https://simonwillison.net/2024/Sep/8/teresa-t-whale-pillar-p...

link