Hacker News new | ask | show | jobs
by rnosov 1063 days ago
The report cites both GPT-3.5 and GPT-4 scores on page 7 [1]. I've checked the numbers and they compare FreeWilly2 to GPT-3.5. For example, HellaSwag score of 85.5% corresponds to GPT-3.5.

[1] https://arxiv.org/pdf/2303.08774v3.pdf