Hacker News new | ask | show | jobs
by mrtksn 946 days ago
Maybe every response can be reviewed by a much simpler and specialised baby-sitter LLM? Some kind of LLM that is very good at detecting a sensitive information and nothing else.

When suspects something fishy, It will just go back to the smart LLM and ask for a review. LLMs seem to be surprisingly good at picking mistakes when you request to elaborate.

1 comments

> Maybe every response can be reviewed by a much simpler and specialised baby-sitter LLM?

This doesn't really work in practice because you can just craft a prompt that fools both.

Then make a third llm that checks whether both of those llms have been fooled.
It's turtles all the way down.