Hacker News new | ask | show | jobs
by objclxt 951 days ago
> Maybe every response can be reviewed by a much simpler and specialised baby-sitter LLM?

This doesn't really work in practice because you can just craft a prompt that fools both.

1 comments

Then make a third llm that checks whether both of those llms have been fooled.
It's turtles all the way down.