Hacker News new | ask | show | jobs
by ignoramous 883 days ago
Nice.

> Some of the popular LLMs that we recommend are: Mistral, CodeLLama

1. Surprised Mistral (Mixtral?) is recommended for code generation / explanation alongside a fine-tuned CodeLlama?

2. Recent human evals put Microsoft's WaveCoder-Ultra-6.7B (SoTA w/ GPT4) at the top of the rankings with WizardCoder-33B, Magiccoder-S-DS-6.7B trailing: https://twitter.com/TeamCodeLLM_AI/status/174755128687745064...

2 comments

Typically when a 6.7B model or similar beats a 33B model it's not really true in my experience. At the least I have very a high burden of proof before believing it.
Are you able explain what the charts mean? Only one of the three has wavecoder at the top.
Those charts show pass@k metric (expectation at least k generated samples are correct out of n) on OpenAI and Octopack problem evals for code.

WaveCoder: https://arxiv.org/abs/2312.14187 (section 3.2)

Octopack: https://github.com/bigcode-project/octopack

While testing internally, Mistral worked well. But these models are just starting points. Will add support for the models WaveCoder-Ultra-6.7B, WizardCoder-33B, Magiccoder-S-DS-6.7B, etc soon.