| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mungoman2 1178 days ago

> 90%*

> * According to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.

Come on... I love any effort related to LLM, but it's really disingenuous to claim high quality in the title and then immediately discredit it after click-through.

1 comments

zhisbug 1178 days ago

but it is indeed difficult to eval chatbots and LLM esp. considering most of them have actually seen the Internet data at least once.

link

zhisbug 1178 days ago

and I do think this is a good effort

link