Hacker News new | ask | show | jobs
by rolha-capoeira 263 days ago
> ... I always say about [a language model], the [linguistic] appearance makes a promise about what it can do. [Clippy] was this little [cartoon paper clip]. It didn’t promise much—you saw it and thought, that’s not going to [write the next great novel]. But you can imagine it [offering limited help]. But [human language] sort of promises it can [write] anything a human can. And that’s why it’s so attractive to people—it’s selling a promise that is amazing.
4 comments

The difference between the promise and reality of LLMs and the difference between the promise and reality of humanoid robots are a different order of magnitude.
In which direction?
When a language model fumbles, its mistakes are still wrapped in convincing writing, so the error is only apparent if the user already knows what the answer should be.

When a humanoid robot fumbles, its mistakes are obvious because the physical world offers immediate feedback.

It's the difference between lying on your résumé that you're a world-class gymnast, and having to actually perform.

How much of this is due to nearly all humans already having advanced knowledge of what they would expect out of a humanoid robot in the home?

With the gymnast example, as a non-gymnast, I don’t know the difference between a high and low scoring routine on the floor or beam. If a humanoid robot did a routine and didn’t fall, I would assume all is well. I don’t know the technical details of what is required for a gymnastics competition.

This seems like the same idea as an LLM writing a paper that looks correct to someone who doesn’t already know the answer.

In a home context, this could look like the robot not practicing proper food safety or storage around someone who doesn’t know the details about that kind of thing, which is a good number of people. What it’s doing might look correct enough, and it produces food you can eat… all is well, until you get sick and don’t know why.

Which gymnast competition? The well known ones are more beauty contests with/on gymnastics equipment. However there are also competitions where they measure objective things. I know what I like to see in a beauty contest, but that is also subjective. I too don't know what a technical competition is measuring, but I know they have objective things they look for.
I don’t know what you’re referring to when talking about gymnastics as a beauty contest.

I’m not an expert, but I know there are specific moves with various degrees of difficulty. I believe there is a max score based on that difficulty level, and any imperfection will lower that score, such as a foot pointed or flexed the wrong way at the wrong time, taking an extra step on a landing, etc.

I know all these rules exist, but I’m not an expert where I can say someone had their foot flexed when it should have been pointed. These details would go over my head, where a humanoid robot might get a pass from me, while an actual gymnast or judge would be able to see faults.

Makes you wonder on the outcome, as the current direction is to build humanoid robots communicating via LLM.

So the robot might be equally convincing that it is capable to clean your windows as it is capable to repair your car brakes.

You saw it clean your windows and are satisfied, and both its form and words are promising that it can repair your brakes equally well...

This is an interesting premise.

I’m kinda torn between “genAI powered robots will have a ground truth reality as a reference, so they will ultimately be more grounded and effective that LLMs” and “LLMs are like drunk uncle Steve with his PHD swimming in vodka, and using genAI in robots will end up as well as having drunk uncle Steve drive home”.

Guardrails on tasks it will attempt are inevitable, but I can also see that becoming a paywalled enshitification farm.

Yeah, imagine "that guy from the pub" who is unemployed for years because he claims to be "overqualified for everything", and then add that he knows exactly how to convince you that he is capable of EVERYTHING you throw at him...
Agree. Not sure what is worse though. Leaning towards the LLM...
LLMs are much closer in practice, they're already useful for a pretty wide range of tasks. Humanoid robots are still comically clumsy and limited, barely able to complete scripted tech demos.
The difference is very easy to define and notorious difficult to solve: it is physics. And man is physics a hard problem to "solve".

Welcome to the world of hard tech not easy machine learning models. Capital is in short supply, it doesn't go nearly as far and you don't get wild multiples in return if you even get any.

I cannot quite tell what point this is trying to make. LLMs are just the next Clippy? As far as I can remember no one actually liked Clippy, so my read is you are not a fan of LLMs, but I could see it going either way.
I took it to mean that the way LLM's use natural language causes the typical observer to feel as if they can perform far more than what they actually can. Akin to the analogy of humanoid robots.

It plays off of the "if it looks like a duck, quacks like a duck, and walks like a duck" idiom, which of course isn't foolproof and gives avenue to the kind of spectacular advertising that is fueling this hype.

I agree. A personal anecdote.

My mom was lamenting car insurance quotes, so I told her to ask AI. She did, then had it do a Monte Carlo simulation for all the insurances she the AI felt she was qualified for.

It happily replied that it did 1 million monte carlo simulations and here was the result.

To this day I don't think she fundamentally groks that LLMs cannot calculate.

For me, it was a friend that was wildly impressed by ChatGPT (before it could search the web) had "analyzed recent market news and stock price graphs" to give him stock recommendations.
>To this day I don't think she fundamentally groks that LLMs cannot calculate

Can't most LLMs trivially use Python and other languages and libs and calculate?

I used Gemini to take "0.3 grams of KNO3 will raise the nitrate level of 10 gallons of water 4.84 ppm" and generate tables of how many grams of dry fertilizer for 1ppm, 5ppm, 10ppm for my planted aquariums of 144 and 3000 liters. It calculated them perfectly.

https://rotalabutterfly.com/rex-grigg/dosing.htm

LLMs cannot themselves calculate, but they are given tools which can.

They're getting quite good at that now.

ChatGPT can easily do Monte Carlo simulation in its "thinking" step, and has done many times for me. e.g. I asked it to compare savings interest between regular banks and median returns from premium bonds. It's not difficult at all for it to do, you can see the code it's generated to do it + the output, easy to inspect
I understood it as mocking the iRobot's founder quote, that what he says is a false promise, could just as well be applied to LLMs, where it has been a true promise (but he says the opposite mockingly).
I would say the same delusion even applies to the field of machine learning in general.

The "API" of trainable algorithms is essentially "arbitrary bunch of data in -> thing I want out" and the magic algorithm in the middle will figure out the rest.

Because "thing I want" is given as a list of examples, you're not even required to come up with a clear definition of what it exactly is that you want. In fact, it's a major "selling point" of the field that you don't have to.

But all of that creates the illusion that machine learning / "AI" would be able to generate a robust algorithm for any correspondence, as long as you can package up a trainset with enough examples and shore up enough "compute" to do the number crunching. Predict intelligence from passport photos? Or chances of campaign success from political speeches? No problem! Economic outlook from tea leaves? Sure thing!

The setup will not even tell you if your idea just needs more tweaks or fundamentally can't work. In both cases, all you get is a less-than-ideal number in your chosen evaluation metric.

The process is definitely vulnerable to magical thinking.

I think it is possible to avoid, though, by asking if humans can be generally good at the task in question, if working through the implied interface restrictions, and then evaluating whether the required skills can be reflected in an available training data set.

If either of those cannot be definitively answered, it’s probably not going to work.

An interesting example here is the failure of self driving vehicles based on image sensors.

My take is that most of the problems are because a significant fraction of the actual required training data is poorly represented in data that can be collected from driving experiences alone.

As in: If you want a car to be able to drive safely around humans, you need to understand a lot about what humans do and think about. - then apply that same requirement to everything else that occasionally appears in the operational environment.

To understand some traffic management strategies expressed in infrastructure, you’ll need to understand, to some degree, the goals of the traffic management strategy, aka “what were they thinking when they made this intersection?”.

It’s not all stuff you can magically gather from dashcams.

Yeah, my understanding was also that the (remaining) hard part of self-driving cars is guessing the intentions of other traffic participants. There are a lot of assumptions human drivers can make about pedestrians, e.g. whether a pedestrian has seen the car or not, whether they will wait for it, have no intention of crossing at all - or will just run across the street.

A model might potentially be able to understand those situations, but it would need a lot of highly task specific training data and it would never be clear if the training really covered all possible situations.

The other problem I see is that a lot of situations in traffic are really two-way communication, even if it is nonverbal and sometimes so implicit we don't realize it. But usually pedestrians will also try to infer what the driver is thinking whether he saw them, etc. In those situations, a self-driving car is simply a fundamentally different kind of traffic participant and pedestrians will interact with it differently than they would with a normal car. That problem is independent of machine learning and seems much harder to solve to me.

That's not the "API" that's powered the AI boom though. What you're talking about is supervised learning. Generative AI is mostly unsupervised. It's "bunch of data -> similar data conditioned on some input". This goalless nature is one of its strengths.

The sort of questions you're talking about are primarily popular in academia. Run some MLRs against some random dataset you found, publish a paper, maybe do a press release and sell a story to some gullible journalist. It doesn't have huge value. But generative AI isn't like that.

Hasnt written a great novel, wont ever write a great novel, will definitely write regurgitated slop that midwit tech slaves steeped in the works of Malcolm Gladwell and Co. will read four words of and proclaim "Dostoevsky!"
I think that the main reason that LLM writing fails so badly in a field one might assume it would excel in is the lack of being able to model a theory of mind for the reader.

While I have seen LLMs produce some ham-fisted attempts at manipulating the state of mind of the reader, I think that the human process is so obfuscated that it only shows up in occasional echoes and shadows in the training set.

It might be possible to develop a training set that reflected perception and internal mental state vs input using (magic brain scan technology) that could change this, but right now the emotional state of the reader is just missing from the training data.

Indeed. GP isn't making the point they think they're making.

"It's writes like us, it must think like us, and will be able to think anything we can think!"

"It's embodied like us, it must be be like us, and will be able to do anything we can do!"

Flawed thinking layered upon flawed perceptions, but get enough decision makers to buy into it and heaven and earth are moved to further it.

This take is so tiring, here's one of the most surprising things we've ever invented, and people are going "IT CAN'T WRITE DOSTOEVSKY". It's fine if y'all are so jaded, but can you at least keep it to yourselves?
LOL it's awesome, amazing tool, and I never saw it coming glad to have it, I'm responding to GP, what it does is nothing like good writing at all, and the only people who think otherwise are without exception people that have little to no exposure or training in any human arts.
That's not what the GP said, the comment is about how the format (language output) is promising something the technology does not actually deliver.
Why are you allowing yourself to not keep it to yourself, while demanding so of others?
Mine was a request, the GP's was general complaining.
When humans do writing, the quality improves by refining multiple drafts, making sketches and notes of the characters and situations and so on, before synthesizing the final text. A lot of preparation and thought goes into it.

If you just ask an LLM to write something off the cuff, it'll be bad. But doing a lot of prep with a human author guiding it? Not Dostoyevsky level, but not pure slop.

Creative writing sites without anti ai rules are getting swamped. Disheartening looking for people to read and provide feedback when hundreds of users churn out 40k word stories on a daily basis.
We might have to start meeting in person again.