Hacker News new | ask | show | jobs
by BoiledCabbage 353 days ago
> AI is built essentially on averages.

It is, but that also means if you prompt it correctly it will give you the answer of the average graduate student working on theoretical physics, or the average expert on the historical inter-cultural conflict of the country you are researching. Averages can be very powerful as well.

2 comments

Research has no average, there's opinion and experience and nuance. This whole "graduate level" thing (no idea if that's what the parent comment refers to) is so stupid, and marketing at people who have never done research or advanced studies.

Getting an average response by necessity gives you something dumbed down and smoothed over that nobody in the field would actually write (except maybe to train and LLM or contribute an encyclopedia entry).

Not that having general knowledge is a bad thing, but LLM output is not representative of what a researcher would do or write.

One thing the "graduate level" concept reminds me of is Terence Tao's semi endorsement almost a year ago: https://mathstodon.xyz/@tao/113132502735585408 People quote the "The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student." part but ignore all the rest of the nuance in the thread like "It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "(static simulation of a) competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks." or "I inadvertently gave the incorrect (and potentially harmful) impression that human graduate students could be reductively classified according to a static, one dimensional level of “competence”."
I see this argument all the time. That the user must not be prompting correctly.

In my experience the way you prompt is less important than the “averageness” of the answer you’re looking for.

Talking about averages is really misleading. Talk about capabilities instead, framed in tool language if you must.

Quoting https://buttondown.com/hillelwayne/archive/ai-is-a-gamechang... about https://zfhuang99.github.io/github%20copilot/formal%20verifi... "In the post, Cheng Huang claims that Azure successfully used LLMs to examine an existing codebase, derive a TLA+ spec, and find a production bug in that spec." This is not the behavior of the "average" anything.

Take it from someone in the business of exploiting race conditions for money: that’s about as average as you can get. Additionally, whatever Azure is considering “traditional” methods may be bare bones poorly optimized automated code reviews given the egregious issues they’ve had in the past.

As a side note:LLMs by definition do not demonstrate “understanding” of anything.