| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pasabagi 1224 days ago
	Regarding 1., it appears to have about a 30% accuracy rate, and the other 60% is complete nonsense, often complete with fabricated citations. I dearly hope that nobody is ever encouraged to have this machine as their tutor.

2 comments

brokencode 1224 days ago

30% accuracy rate in what exactly? Take a look at the GPT-4 announcement page for graphs showing the accuracy on different standardized tests. It’s not perfect, but making improvements with each release.

One big area where it does poorly right now is math. But they just announced a ChatGPT plugin for Wolfram, which I expect will make it very good at math. Wolfram also has a large database of curated information to draw on.

Technology improves over time. GPT is still new and improving quickly. What it does now isn’t perfect, but it is still incredible.

link

pasabagi 1224 days ago

There's a post on /r/askhistorians where somebody asked ChatGPT for book recommendations on various historical topics. Some of them didn't exist. It actually took an expert reader to identify which books were made up, misatributed, and so on. That's much worse than nothing: it's a horrific timewaste.

My guess is stuff like math, where you can fairly easilly verify the factuality of ChatGPT's answers, is an area where you could certainly see progress. More general stuff like history, where it's important to have a really firm grasp of facts, inutition, and nuance, ChatGPT will likely be hard to improve, and worse, much harder to verify. Worse, these things can be insiduous: if you've learned something straightforwardly wrong, it corrupts future conclusions drawn from that erroneous premise.

link

brokencode 1224 days ago

I think the plugin system will ultimately help for most areas where LLMs are weak today.

Need to do math? Use the Wolfram plugin.

Need to have hard facts from reliable and citable sources? Use a plugin that queries databases like Arxiv. The LLM could give you links to sources and provide quotes from those sources to support its reasoning.

link

motoxpro 1224 days ago

You're right. They won't make any progress on this metric at all.

link

pasabagi 1224 days ago

They might do, but what error rate do you think is acceptable? How do you actually measure and test the error rate? It's a use I can imagine in the future, but I think it's really premature to be using 'personal tutor' as a benefit (as openai do in their advertizing materials) when the program, as it stands, is essentially a fluent and convincing bullshitter, which is the single worst possible trait for a teacher.

link

sebzim4500 1224 days ago

I don't know, I guess I'd have to measure the accurate rate of the average personal tutor and then wait for it to cross that threshold.

link

motoxpro 1224 days ago

Error is acceptable in all things that don't need to be deterministic. Which is most thing in life. How do I discover a good career path? Why do you structure a repository of a program into folders? What is the best place to vacation? What is the best way to learn math? What is a good way to articulate socialism? How do I increase my vocabulary? What is corporate strategy?

Ask 10 different people "smart" people (define smart however you want), you'll get 10 different answers to these questions. These are all questions an LLM could answer amazingly. Probably a lot better than most humans.

If you don't ask it what 2+2 is or who came to in America in 1875 then you get useful things.

Asking a LLM deterministic questions right now is like asking a calculator what the meaning of life is. If you use the tool for something it's not good at you get unusable answers.

link

pasabagi 1224 days ago

If you ask an idiot what he thinks about something, and he gives you a totally wrong answer, you have still learned at least one fact: a person believes a thing. As a human, living in a democracy, that has some worth. ChatGPT's wrong answer has absolutely no value at all.

Further, 10 different smart people will give 10 different answers because they have coherent worldviews and biases and proclivities, so by accounting for those, you can work out what the right answer is. Even if ChatGPT was anywhere close to a human expert when it comes to accuracy (what's the error rate in a peer reviewed journal article?) it would still have no coherent worldview or bias to contextualize its statements.

link

Sai_ 1224 days ago

I see what you’re saying. The world has lost its collective mind.

HN seems to want to hand over the keys to the kingdom to basically a string generator. The string generator believes nothing, understands nothing, knows nothing but here we are.

Any intelligence that gpt4 shows is an emergent property. Humans are the ones reading GPT’s output and imputing meaning to it.

Reminds me of astrology and mass hysteria - people convincing each other to give this new oracle a chance because they personally have seen value in its ramblings.

link

motoxpro 1224 days ago

I understand what you're saying. I just thing the world isn't so black and white. When you ask the idiot, or the smart person for that matter, a question, you have no idea if they are right or not. You only know after the fact when you get enough data to prove them wrong or someone that you trust more than that person tells you otherwise.

What is your error rate? What is my error rate? All of this stuff is unknown because we don't have counter factuals and we don't think of the world in this way (Did you order the correct food at dinner? Did you wash your clothes at the optimal time?)

To me you're thinking of it as a classical deterministic (binary) computer rather than a probabilistic thing. It's not an oracle, or a miracle, or anything other than some thing that gives useful information some percentage of the time. If something has to be right 100% of the time for it to be useful, or even 60% of the time, then the world is missing out on a lot of value.

Investors are right ~51% of the time, startup founders in the aggregate are right ~10% of the time, a great batting average is ~30%, etc.

link