Hacker News new | ask | show | jobs
by zhisbug 1173 days ago
but it is indeed difficult to eval chatbots and LLM esp. considering most of them have actually seen the Internet data at least once.
1 comments

and I do think this is a good effort