| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by divan 1165 days ago
	This will still hallucinate, right? Projects like this for using with your documents datasets are invaluable, but everything I've tried so far is hallucinating, so not practical. What's the state of the art of the LLM without hallucination at the moment?

3 comments

Art9681 1165 days ago

Like many others, I’m also building my own platform to accomplish this. What I’ve learned is the document preparation is key in getting the LLM to answer correctly. The text splitting portion is a crucial step here. Picking the correct splitter and parameters for your use case is important. At first I was getting incorrect or made up answers. Setting up a proper prompt template and text splitting parameters fixed the issue for the most part and now I have 99% success.

Also, the local model used makes a big difference. Right now wizard-mega and manticore are the best ones to use. I run the 16b ggml versions in an M2 Pro and it takes about 30 seconds to “warm up” and produce some quality responses.

link

anu7df 1165 days ago

Not exactly sure if this would qualify as an LLM in the GPT4 sense. But for no hallucination this seems good: https://www.thirdai.com/pocketllm/ Full disclosure. I know the founder, but not really associated with the company in any way.

link

XCSme 1165 days ago

How do you define hallucination?

link

divan 1165 days ago

factually incorrect / nonsensical output

link

TeMPOraL 1165 days ago

It will still talk like a human blurting out their train of thought out loud, yes.

link

XCSme 1165 days ago

I assume this is only possible if the training data contains only a "right answer". If the training data contains two contradicting answers A and B, then, from the AIs perspective, there is no correct answer.

I assume that for questions like "What year was Bill Gates born in?", it should never return a wrong answer, if the answer was in the training data. If it was not, it should respond that it doesn't know.

link