Hacker News new | ask | show | jobs
by mtrovo 94 days ago
I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.

The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.

See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.

2 comments

> there's a reason nobody outside tech trust so blindly on LLMs.

Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.

I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)

We had politicians share LLM crap, researchers doing papers with hallucinated citations..

It's not just tech people.

We were working on translations for Arabic and in the spec it said to use "Arabic numerals" for numbers. Our PM said that "according to ChatGPT that means we need to use Arabic script numbers, not Arabic numerals".

It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.

We're cooked.

Kind of a tangent but that did make me curious about how numbers are written in Arabic: https://en.wikipedia.org/wiki/Eastern_Arabic_numerals
I guess "Western Arabic" would have been more precise.
The architect should have required Hindu numbers. Same result, but even more confusion.
Man this is maddening.
Honestly I think we're just becoming more aware of this way of thinking. It's certainly exacerbated it now that everyone has "an expert" in their pocket.

It's no different than conspiracy theorists. We saw a lot more with the rise in access to the internet. Not because they didn't put in work to find answers to their questions, but because they don't know how to properly evaluate things and because they think that if they're wrong then it's a (very) bad thing.

But the same thing happens with tons of topics, and it's way more socially acceptable. Look how everyone has strong opinions on topics like climate, rockets, nuclear, immigration, and all that. The problem isn't having opinions or thoughts, but the strength of them compared to the level of expertise. How many people think they're experts after a few YouTube videos or just reading the intro to the wiki page?

Your PM is no different. The only difference is the things they believed in, not the way they formed beliefs. But they still had strong feelings about something they didn't know much about. It became "their expert" vs "your expert" rather than "oh, thanks for letting me know". And that's the underlying problem. It's terrifying to see how common it is. But I think it also leads to a (partial) solution. At least a first step. But then again, domain experts typically have strong self doubt. It's a feature, not a bug, but I'm not sure how many people are willing to be comfortable with being uncomfortable

There’s a possibility the same people might believe anything they read on social media or via Google and it’s something worthy of attention.
And the worst part is, these people don't even use the flagship thinking models, they use the default fast ones.
In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.