| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by volkercraig 17 days ago
	More than that, the entire structure of the study is pointless. They set up as a question/response and then had humans rate the response. That's literally what LLM's are trained to do, which ultimately is convincing a human to click the "I like this one better" button on it's response.

3 comments

enoch_r 17 days ago

LLMs are trained to convince a typical human to click the "I like this one better" on their response.

Convincing a human law professor to click the "I would prefer to deliver this response to a student" button, and to not click the "this response is pedagogically harmful" button is a different task!

I could imagine an LLM convincing a typical human to click the "I like this one better" button with flattery, or with nice-sounding platitudes, or with hand-wavey explanations that sound plausible. And in fact that's exactly what LLMs do when they go wrong - they bluff and output superficially plausible nonsense!

But these weren't typical humans, these were law professors specifically tasked with deciding which response was a better option to give to students as a canonical answer to a contract law question. So I think this is a genuinely impressive result.

link

vonneumannstan 17 days ago

This is kind of like saying you can't compare Computer Vision models to Human performance because those models were literally trained to identify objects in images...

link

volkercraig 17 days ago

I'm not saying you can't compare them, I'm saying it's pointless. LLM's are extremely large scale multivariate regression machines, evaluating it's output within it's own training domain is as pointless as seeing if a ball rolls downhill.

link

dcre 17 days ago

They're only good at it because that's what they're good at? Come on.

link

FromTheFirstIn 17 days ago

They’re not good at it because they understand the law

link

IAmBroom 17 days ago

IRDC if the LLMs "understand" anything. They are being used here to produce outputs that are desirable. (Neglecting the real possibility that this "survey" is complete BS, as noted elsewhere.)

link

FromTheFirstIn 17 days ago

Exactly

link