Hacker News new | ask | show | jobs
by mungoman2 1178 days ago
> 90%*

> * According to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.

Come on... I love any effort related to LLM, but it's really disingenuous to claim high quality in the title and then immediately discredit it after click-through.

1 comments

but it is indeed difficult to eval chatbots and LLM esp. considering most of them have actually seen the Internet data at least once.
and I do think this is a good effort