Hacker News new | ask | show | jobs
by rnosov 1181 days ago
There are benchmarks in the original LLaMA paper[1]. Specifically, on page 4 LLaMA 13B seems to beat GPT-3 in BoolQ, HellaSwag, WinoGrande, ARC-e and ARC-c benchmarks (not by much though). Examples that you've seen are likely to be based on some form quantisation / poor prompt that degrade output. My understanding that the only quantisation that doesn't seem to hurt the output is llm.int8 by Tim Dettmers. You should be able to run LLaMA 13B (8 bit quantised) on the 3090 or 4090 consumer grade GPU as of now. Also, you'd need a prompt such as LLaMA precise[2] in order to get ChatGPT like output.

[1] https://arxiv.org/pdf/2302.13971v1.pdf

[2] https://www.reddit.com/r/LocalLLaMA/comments/11tkp8j/comment...