Hacker News new | ask | show | jobs
by jspisak 1024 days ago
It would be interesting to understand if a ~30B Llama-2 model would be interesting and for what reasons.
3 comments

Better reasoning and general performance than 13b by far (if llama1 was any indication), and like the other user said, can fit on a single 24gb vram gaming card, and can be peft fine-tuned with 2x 24gb cards.
Llama-1-33B was trained on 40% more tokens than LLama-1-13B; this explained some of the disparity. This time around they both have the same data scale (2T pretraining + 500B code finetune), but 34B is also using GQA which is slightly more noisy than MHA. Furthermore, there have been some weird indications in the original LLama-2 paper that 34B base model is something… even more special, it's been trained on a separate internal cluster with undervolted/underclocked GPUs (though this in itself can't hurt training results), its scores are below expectations, it's been less "aligned". Here, Code-Llama-Instruct-13B is superior to 34B on HumanEval@1. So yes, it's desirable but I wouldn't get my hopes up.
Llama 34B is just big enough to fit on a 24GB consumer (or affordable server) GPU.

Its also just the right size for llama.cpp inference on machines with 32GB RAM, or 16GB RAM with a 8GB+ GPU.

Basically its the most desirable size for AI finetuning hobbyists, and the quality jump from llama v1 13B to llama v1 33B is huge.

It would fit on the 24GB top-end consumer graphics cards with quantization.