Hacker News new | ask | show | jobs
by naasking 783 days ago
LLMs already write English better than most native speakers. I wouldn't bet everything on this.
4 comments

I'm surprised some people think this is a matter of checking whether formal sentences match some English sentences lol It is a matter of checking whether formal sentences match mathematical statements, which are written in natural language.

Imagine someone saying "just write good English lol eventually you can do good math". I'm aware you're not quite saying this but you seem really distracted by the human language representation of math, connecting doing math proofs to generating English sentences from some probability distribution is ridiculous. Of course it is possible LLMs can do math, if it's matter of having nonzero chance, there's also a nonzero chance we are all brains in vats. More rationally if someone says something is possible they should produce some evidence that it is possible. And then we decide whether that evidence is good enough.

Writing well in any human language is not good enough, since it is entirely different from being able to tell whether a set of formal axioms capture certain ideas about a mathematical structure. This is a model theoretic issue. Neural network theorem provers deal with proof theoretic issue.

The best LLM-Lean provers right now are tackling the very challenging problem of how to generate the right sequence of tactics for infinite search space, all relying on, excuse me, undergrad students to formalize proofs and statements for them.

Do you trust LLM so much that you don't check what it writes before sending the email?

LLMs can write better English, but the curating step is still critical, because it also produces a lot of garbage.

Would you trust a brand new assistant to write up an email for you without proof reading it? How much training would they require before you didn't need that step? How much training / fine-tuning would an LLM need? What about the next gen LLM?

Remember, we're not talking about a static target here, and the post I replied to set no qualifications on the claim that a human will always be needed to check that a mathematical definitions in the proof match the English equivalents. That's a long timeline on a rapidly moving target that is, as I said, already seems to be better than most humans at understanding and writing English.

> Would you trust a brand new assistant to write up an email for you without proof reading it?

Depends on the complexity, but for the simpler things I think I could get confident in a day or so. For more complex things, it might take longer to assess their ability.

But I'm not going to trust LLM blindly for anything.

> I replied to set no qualifications on the claim that a human will always be needed to check that a mathematical definitions in the proof match the English equivalents.

I don't defend this strong claim and limit my answer to LLMs (and mostly just state of the art). OTOH I believe that trust will continue to be a big topic for any future AI tech.

> But I'm not going to trust LLM blindly for anything.

Again, what does "blindly" mean? Suppose you went a month without finding a single issue. Or two months. Or a year. The probability of failure must literally be zero before you rely on it without checking? Are you applying the same low probability failure on the human equivalent? Does a zero probability of failure for a human really seem plausible?

There's a reason “if you want it done right, do it yourself” is a saying.
I feel like this conversation is incorrectly conflating "probability of error" with "recourse when things go wrong".

Choosing to handle it yourself does not reduce probability of error to zero, but it does move the locus of consequence when errors occur. "you have nobody to blame but yourself".

One reason people might trust humans over AI regardless of failure rate is answering the questions "what recourse do I have when there is an error" compounded by "is the error model self-correcting": EG when an error occurs, does some of the negative consequence serve to correct the cause of the error or doesn't it.

With another human in the loop, their participation in the project or their personal honor or some property can be jeopardized by any error they are responsible for. On the one hand this shields the delegator from some of the consequence because if the project hemorrhages with errors they can naturally demote or replace the assistant with another who might not have as many errors. But on the other hand, the human is incentivized to learn from their mistakes and avoid future errors so the system includes some self-correction.

Using a static inference LLM, the user has little recourse when there is an error. Nowhere to shift the blame, probably can't sue OpenAI over losses or anything like that. Hard to replace an LLM doing a bad job aside from perhaps looking at ways to re-fine-tune it, or choose a different model which there aren't a lot of materially competing examples.

But the biggest challenge is that "zero self-correction" avenue. A static-inference LLM isn't going to "learn from its mistakes", and the same input + random seed will always produce the same output. The same input with a randomized seed will always produce the same statistical likelihood of any given erroneous output.

You'd have to keep the LLM on a constant RLHF fine tuning treadmill in order for it to actually learn from errors it might make, and then that re-opens the can of worms of catastrophic forgetting and the like.

But most importantly, that's not the product that is presently being packaged one way or the other and no company can offer any "learning" option to a single client at an affordable price that doesn't also commoditize all data used for that learning.

It will eventually become as chess is now: AI will check and evaluate human translation to and from English.
And if it says the human got it wrong, then tough luck for the human if they didn't. :(
Constructing a sentence is only the last step in writing, akin to pressing the shutter release on a camera. LLMs can turn a phrase but they have nothing to say because they do not experience the world directly. They can only regurgitate and remix what others have said.
Apart from the whole "generating bullshit" thing, sure.
Humans still generate more bullshit than LLMs.
Citation needed.
Twitter.
Do humans use twitter? I thought it was mostly bots by now.
Some do. Not all.
>LLMs already write English better than most native speakers...

till they incorporate more of what some of your writing and loose their advantages