Hacker News new | ask | show | jobs
by cortesoft 830 days ago
Human created content is also filled with gibberish and false information and random noise… how is AI generated content worse?
4 comments

Arsenic naturally occurs... how are automatic factories that dump millions of tons it in the nearby river worse?
> how is AI generated content worse?

This is a crucial question.

In human society, a feedback loop of nonsense is usually defeated by practical effects in physical reality and experience. The objective of education, for example, is to transmit knowledge and apply reason to important questions.

In manipulated social media, there is no check on the nonsense loop. The technology that we currently call A.I. could be used for educational good.

How it will be used, however, is likely to further distort discourse and generate nonsense.

In addition to different decisions made by individuals, they also can't power a feedback loop 24/7 a kerjillion times per minute.
It is worse, because it is faster - how many incorrect blog articles can a sigle typical writer publish and post on the internet - maybe 1-2 a day if you are a prolific writer?

How many can an AI agent do? Probably hundreds of thousands a day. To me, that is going to be a huge problem - but don't have a solution in mind either.

And then those 100K bad articles posted per day by one person, are used as training data for the next 100K bad/incorrect articles etc - and the problem explodes geometrically.

Imagine you have a calculator that outputs a result that is off by one percent. That's ai right now.

If you use the results of each calculation in additional calculations, the result will skew further and further from reality with each error. That's ai training on itself.

In many areas of communication and information, this exact problem is dealt with through error correction codes. Do AI models have built in ECC?
No, LLMs with soft attention use compression, and actually has no mechanism for ground truth.

They are simply pattern finding and matching.

More correctly, they are uniform consent depth threshold circuits.

Basically parallel operations on a polynomial number of AND, OR, NOT, and majority gates.

The majority gates can do the Parity function, but cannot self correct like ECC does.

The thing with majority gates is that they can show some input is in the language:

This the truthiness of 1,1,1,0,0 being true, but 1,1,0,0,0 would be failure as negation, but doesn't prove that negation, it isn't a truthy false.

With soft attention will majority gates they can do parity detection but not correction.

Hopefully someone can correct this if I am wrong.

Specifically I think that the upper bound of deciding whether X = x is a cause of m) in structures is NP-complete in binary models (where all variables can take on only two values) and Σ_2^P -complete in general models.

As TC_0 is smaller than NP, and probably smaller than P, any methods would be opportunistic at best.

Preserving the long tail of a distribution is a far more pragmatic direction as an ECC type ability is unreasonable.

Thinking of correctional codes as serial turing machine and transformers as primarily parallel circuits should help with understanding why they are very different.

The trouble is "truth" and math are different.

You can verify a mathematical result. You can run the calculations a second time on a separate calculator (in fact some computers do this) to verify the result, or use a built in check like ecc.

There's no such mathematical test for truth for an ai to run.

Error correction doesn’t insure truth. At least in communication, it insures that the final version matches the original version.

For AI, you wouldn’t be doing EC to make sure the AI was saying truth, you would be doing EC to ensure that the AI hasn’t drifted due to the 1% error rate.

Of course I have no idea how to actually do it - if it isn’t being done now, it is probably hard or impossible.

There's no fully general test for truth for an AI to run.

In some specific domains such tests exist — and the result is, generally, computers wildly outperforming humans. But I get the impression from using them that current LLMs didn't take full advantage of this during training.