Hacker News new | ask | show | jobs
by levzettelin 659 days ago
Probably just companies trying to impede progress of other companies. Not to say that the statement is wrong necessarily. But given that this is coming from a group of people that could very easily solve the problem, I'll take it with a grain of salt.
1 comments

It’s not an easily solvable problem. They can’t make an AI which won’t lie or make stuff up, which is sort of the root of the problem. Imagine an AI which is granted access to control systems. We can’t trust such an AI to run control systems any more than we can trust it not to lie or make stuff up. There isn’t the sort of rigor behind AI development to permit creating a provably correct AI. There needs to be more study in order to understand the limits of AI fallibility and failure modes.
They could just collectively stop working on the problem until they feel that the issue is resolved (moratorium). That's what I meant by "they could very easily solve the problem".
Or they could still work on it, but don’t use it on customers until it’s good enough.

> Zhou shared a striking example of how AI-generated content could lead to real-world consequences. “Some of the initial stock images of various ingredients looked like a hot dog, but it wasn’t quite a hot dog—it looked like, kind of like an alien hot dog,” he said. Such errors, he argued, could erode consumer trust or, in more extreme cases, pose actual harm. “If the recipe potentially was a hallucinated recipe, you don’t want to have someone make something that may actually harm them.”

There’s absolutely no reason Instacart has to show customers AI-hallucinated recipes from stock images. They choose to do it, then beat the drum about AI security as if they actually give a shit. It’s like Boeing self-certification.

There are some techniques to alleviate hallucination, contradictory or confusing answers, but I have difficulty imagining a provable correct LLM because the attack surface is so large. The current methods to train for AI safety might be augmented with insights from chaos engineering, cognitive psychology, marketing and persuasion - making them agogic truth machines scoring very low on hallucination benchmarks [1].

I think we should program and train LLM with universal recognized agogic principles instead of being neutral in this regard, to encourage critical thinking and prevent 'reality tunnels' in the mindset of the users and perhaps incorporating this also in future training and curating techniques [2][3][4]. How to raise GenAI and future AGI well.

There are LLM training techniques to alleviate hallucinogenic, contradictory and confusing answers. These might be augmented with insights from chaos engineering, cognitive psychology and persuasion - making them agogic truth machines scoring very low on hallucination benchmarks [1].

I think we should program and train LLM with universal recognized agogic principles instead of being neutral in order encourage critical thinking and prevent 'reality tunnels'. Perhaps incorporating this in future training and curating techniques [2][3][4]

* Data curation Ensure data used to train AI models is balanced and diverse helps in preventing biases that could lead to hallucinations or harmful outputs. So curating data from a wide range of sources, cultures and viewpoints. Implementing quality control during data collection and preprocessing to filter out unreliable, outdated, or biased information.

* Targeted post-training (fine-tuning) After initial training models can be fine-tuned using datasets specifically designed to emphasize helpfulness, harmlessness and alignment with ethical principles. Embed ethical guidelines in datasets, for example include scenarios to handle sensitive topics, avoid hate speech and promote fairness.

* Red-teaming Red-teaming involves stress-testing the model by simulating adversarial attacks or intentionally providing challenging prompts to see how the model responds. This helps identify weaknesses, such as susceptibility to generating harmful content or hallucinations. This can be used to improve the model's robustness and safety.

* Post-training datasets focused on responsible AI principles Incorporating datasets that help the model understand context and nuance of various topics, ensuring it can provide appropriate responses to the situation.

* Refusal-aware instruction tuning While data curation, targeted post-training, and red-teaming help to prevent the introduction and propagation of false or harmful content, R-tuning directly enhances the model's ability to recognize its limitations. Enabling the model to refuse to answer questions beyond its knowledge.

* Iterative user feedback based refinement Continuously collecting and analyzing feedback from users and independent review teams helps identify issues that may not have been apparent during development.

[1] Vectara hallucination leaderboard https://github.com/vectara/hallucination-leaderboard

[2] On epistemic black holes: How self-sealing belief systems develop and evolve". Maarten Boudry and Steije Hofhuis in the journal Theoria August 2024 https://onlinelibrary.wiley.com/doi/epdf/10.1111/theo.12554

[3] Costello, T. H., Pennycook, G., & Rand, D. G. (2024, April 3). Durably reducing conspiracy beliefs through dialogues with AI. https://doi.org/10.31234/osf.io/xcwdn https://osf.io/preprints/psyarxiv/xcwdn

[4] BriX: Reducing polarization through Bridging and eXposure https://research.qut.edu.au/genailab/projects/brix-reducing-...

As commented before, by say that they "could very easily solve the problem" I meant that they could just collectively stop using the problematic AIs in prod until they feel that the issue is resolved (moratorium). Not that it's easy to resolve the technical difficulties.