| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fsiefken 659 days ago

There are some techniques to alleviate hallucination, contradictory or confusing answers, but I have difficulty imagining a provable correct LLM because the attack surface is so large. The current methods to train for AI safety might be augmented with insights from chaos engineering, cognitive psychology, marketing and persuasion - making them agogic truth machines scoring very low on hallucination benchmarks [1].

I think we should program and train LLM with universal recognized agogic principles instead of being neutral in this regard, to encourage critical thinking and prevent 'reality tunnels' in the mindset of the users and perhaps incorporating this also in future training and curating techniques [2][3][4]. How to raise GenAI and future AGI well.

There are LLM training techniques to alleviate hallucinogenic, contradictory and confusing answers. These might be augmented with insights from chaos engineering, cognitive psychology and persuasion - making them agogic truth machines scoring very low on hallucination benchmarks [1].

I think we should program and train LLM with universal recognized agogic principles instead of being neutral in order encourage critical thinking and prevent 'reality tunnels'. Perhaps incorporating this in future training and curating techniques [2][3][4]

* Data curation Ensure data used to train AI models is balanced and diverse helps in preventing biases that could lead to hallucinations or harmful outputs. So curating data from a wide range of sources, cultures and viewpoints. Implementing quality control during data collection and preprocessing to filter out unreliable, outdated, or biased information.

* Targeted post-training (fine-tuning) After initial training models can be fine-tuned using datasets specifically designed to emphasize helpfulness, harmlessness and alignment with ethical principles. Embed ethical guidelines in datasets, for example include scenarios to handle sensitive topics, avoid hate speech and promote fairness.

* Red-teaming Red-teaming involves stress-testing the model by simulating adversarial attacks or intentionally providing challenging prompts to see how the model responds. This helps identify weaknesses, such as susceptibility to generating harmful content or hallucinations. This can be used to improve the model's robustness and safety.

* Post-training datasets focused on responsible AI principles Incorporating datasets that help the model understand context and nuance of various topics, ensuring it can provide appropriate responses to the situation.

* Refusal-aware instruction tuning While data curation, targeted post-training, and red-teaming help to prevent the introduction and propagation of false or harmful content, R-tuning directly enhances the model's ability to recognize its limitations. Enabling the model to refuse to answer questions beyond its knowledge.

* Iterative user feedback based refinement Continuously collecting and analyzing feedback from users and independent review teams helps identify issues that may not have been apparent during development.

[1] Vectara hallucination leaderboard https://github.com/vectara/hallucination-leaderboard

[2] On epistemic black holes: How self-sealing belief systems develop and evolve". Maarten Boudry and Steije Hofhuis in the journal Theoria August 2024 https://onlinelibrary.wiley.com/doi/epdf/10.1111/theo.12554

[3] Costello, T. H., Pennycook, G., & Rand, D. G. (2024, April 3). Durably reducing conspiracy beliefs through dialogues with AI. https://doi.org/10.31234/osf.io/xcwdn https://osf.io/preprints/psyarxiv/xcwdn

[4] BriX: Reducing polarization through Bridging and eXposure https://research.qut.edu.au/genailab/projects/brix-reducing-...

1 comments

levzettelin 659 days ago

As commented before, by say that they "could very easily solve the problem" I meant that they could just collectively stop using the problematic AIs in prod until they feel that the issue is resolved (moratorium). Not that it's easy to resolve the technical difficulties.

link