| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrtksn 946 days ago
	Maybe every response can be reviewed by a much simpler and specialised baby-sitter LLM? Some kind of LLM that is very good at detecting a sensitive information and nothing else. When suspects something fishy, It will just go back to the smart LLM and ask for a review. LLMs seem to be surprisingly good at picking mistakes when you request to elaborate.

1 comments

> Maybe every response can be reviewed by a much simpler and specialised baby-sitter LLM?

This doesn't really work in practice because you can just craft a prompt that fools both.

Then make a third llm that checks whether both of those llms have been fooled.

It's turtles all the way down.