Hacker News new | ask | show | jobs
by jeffreyw128 993 days ago
It’s especially terrifying that misinformation compounds multiplicatively with AI because it happens in 2 layers - once at the retrieval layer (where AI-generated content is worsening the problem of bad SEO content) and again at the retrieval augmented generation (RAG) LLM layer.

(shameless plug) At Metaphor (https://platform.metaphor.systems/), we’re building a search engine that avoids SEO content by relying on human curation + neural embeddings for our index + retrieval algorithm. Our mission is to ensure that the information we receive is as high quality and truthful as possible as AI adoption marches onwards. You (or your LLM) can feel free to give it a try :)

1 comments

+1. Increasing training data quality should also hopefully help with hallucination. But becomes increasingly hard in a world where more and more online content is crap.
I'm not an expert in training LLMs, but I've heard that some people use reinforcement algorithms to train and align LLM behaviors with human preferences. When it comes to designing a loss function for training, I wonder if it's possible to assign an extremely high loss value to hallucinated content during training. This approach might encourage the model to refrain from generating inaccurate content.