hey -- wrote a blog post about this exact phenomena [1] (also posted on HN couple weeks back [2]). tldr: maintaining confidence from LLM nondeterministic outputs over millions of pages is a problem. especially in production environments like healthcare, finance, etc. we've noticed decently high hallucination rates, even in finetuned LLMs.