Hacker News new | ask | show | jobs
by rnosov 1207 days ago
You can read the original LLaMA paper which is pretty accessible[1]. For example, they claim to outperform GPT-3 on HellaSwag benchmark ( finishing sentences ). You can find examples of unfinished sentences in the HellaSwag paper [2] on page 13. Unfortunately for LLaMA, most people would be probably just asking questions about Captain Picard and so on, and on this benchmark LLaMA significantly underperforms compared to OpenAI models (thats's from their paper).

[1] https://research.facebook.com/file/1574548786327032/LLaMA--O...

[2] https://arxiv.org/pdf/1905.07830.pdf

1 comments

Hellaswag is also a deeply flawed benchmark, I wouldn't read too much into it: https://www.surgehq.ai/blog/hellaswag-or-hellabad-36-of-this...