Hacker News new | ask | show | jobs
by soulofmischief 1022 days ago
If you have an Apple Silicon machine, combine [0] with [1] for state of the art local code completion and general Q/A.

[0] https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-...

[1] https://github.com/ggerganov/llama.cpp

2 comments

How does it compare to GPT4 these days? The article yesterday made it sound like it's getting close.
It's not getting close to GPT-4. It's getting closer according to synthetic benchmarks but spend any non-trivial time with both and you'll quickly realise GPT-4 is still leagues ahead, especially for writing code and more complex reasoning. Which makes sense since the model is orders of magnitude larger parameter count wise.

Don't get me wrong, it's still remarkable that we already have LLMs that can be ran on consumer grade hardware that are anywhere near GPT-3.5/4 levels. But if you want the absolute highest quality of output GPT-4 is still way to go.

I have found it pretty decent at explaining math and physics concepts, and generating some basic code. It seems over-tuned for code generation (on purpose) as sometimes it inappropriately generates code when asking non-code questions.

Overall, it performs better than GPT-3.5-turbo in many use cases. Harder to quantify the GPT-4 comparison, as there are multiple versions of GPT-4 which are rumored to have significantly varied outputs.

It's definitely worth giving a shot. The parameter size of 34B makes a big difference, and it's been found that you're still better off extremely quantizing a larger-parameter model than using unquantized smaller models.

This post from today says that Llama 2 7b performs at the level of gpt-4, but 30x cheaper.

https://www.anyscale.com/blog/llama-2-is-about-as-factually-...

You don't need apple silicon, llama.cpp runs just fine on Windows/x86