Hacker News new | ask | show | jobs
by nicklecompte 796 days ago
What this work actually shows is that a bunch of scientists are ignorant about how LLMs work, but are rushing to publish papers about them anyway. It is ridiculous for Thirunavukarasu to draw this conclusion from GPT-4's performance on a written ophthalmology exam.

From the good folks at AI Snake Oil[1]

> Memorization is a spectrum. Even if a language model hasn’t seen an exact problem on a training set, it has inevitably seen examples that are pretty close, simply because of the size of the training corpus. That means it can get away with a much shallower level of reasoning....In some real-world tasks, shallow reasoning may be sufficient, but not always. The world is constantly changing, so if a bot is asked to analyze the legal consequences of a new technology or a new judicial decision, it doesn’t have much to draw upon. In short, as Emily Bender points out, tests designed for humans lack construct validity when applied to bots.

> On top of this, professional exams, especially the bar exam, notoriously overemphasize subject-matter knowledge and underemphasize real-world skills, which are far harder to measure in a standardized, computer-administered way. In other words, not only do these exams emphasize the wrong thing, they overemphasize precisely the thing that language models are good at.

Also[2]:

> Undoubtedly, AI and LLMs will transform every facet of what we do, from research and writing to graphic design and medical diagnosis. However, its current success in passing standardized test after standardized test is an indictment of what and how we train our doctors, our lawyers, and our students in general. ChatGPT passed an examination that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it was created, how it is maintained. Its success demonstrates some of the shortcomings in how we train and evaluate medical students. Critical thinking requires appreciation that ground truths in medicine continually shift, and more importantly, an understanding how and why they shift. Perhaps the most important lesson from the success of LLMs in passing examinations such as the USMLE is that now is the time to rethink how we train and evaluate our students.

[1] https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmar...

[2] https://journals.plos.org/digitalhealth/article?id=10.1371/j...

1 comments

That might be the thing that makes me most optimistic about AI.

Not because they’re super useful. They are, if and only if you use them right, which is a skill few people seem to have.

But because they’re illuminating flaws in how we’re train our students, and act as a forcing function to _make_ the universities and schools fix that. There’s no longer any choice!

Software could "learn" multiplication by simply creating a lookup table for every value to some reasonable degree. A test isn't going to have somebody multiplying 100 digit numbers, or even 10 digit numbers. Every single number up to 5 digits could be done with just 10 billion entries. This doesn't mean that a multiplication test is just testing memorization, or that math is just memory. It simply means that machine learning learning/application and human learning/application have relatively little in common, in spite of the constant and generally awkward attempts to try to anthropomorphize machine processes.
What you're describing aren't AI at all, there's no intelligence there.
And the cloud is water droplets, not a datacenter. Apple is a fruit, not a product company. OpenAI isn't incredibly open, and so on. There was a time when it was worthy to die on the hill that AI is really just a product of ML, or whatever, but that ship has long sailed. It's not really intelligent (most likely), but the term is stuck now. Time to move on, the horse is long dead.
What term are we supposed to use then with regards to concerns over actual AI? It sure feels like the term artificial intelligence was repurposed, and bastardized, to be nothing more than ML and make any concerns over an AI sound ridiculous.

I'm not concerned over LLMs personally, though I do have serious concerns how we'll handle it if/when we develop an actual artificial intelligence. I can't really share those concerns clearly at all if the term AI has been used to make these discussions effectively meaningless.

Neither clouds nor Apple are topics of debate. Concerns over AI have been raised for decades and largely went unanswered, leaving us with tech getting closer and closer to it and no one willing or able to have any meaningful discussions about it at scale. OpenAI has an explicit goal, for example, of creating an AGI. Maybe AGI is the new term for AI, though I disagree with their definitional metric of economic value, which again leaves us with someone trying to purposely build an artificial intelligence without us first deciding the basics like would an AI have rights or will turning it off be tantamount to murder.