| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by a_wild_dandan 904 days ago
	Are people still rawdoggin' 16-bit models? I almost exclusively use 5-bit inference quants (or 8-bit natives like Yi-34b) on my MacBook Pro. Tiny accuracy loss, runs fast, and leave plenty of (V)RAM on the table. Mixtral 8x7 is my new daily driver, and only takes like 40GB to run! I wonder if I could run two of them talking to each other...

2 comments

rubatuga 903 days ago

Pure 16bit is horrible for training, sorry.

link

rdedev 903 days ago

Doesn't using bf16 alleviate the problem? At least I've had success training a Bert like model from scratch

link

furiousteabag 903 days ago

Mixed precision is a default method to pretrain and full fine tune right now. It is especially good in transformers, because they have memory bottleneck in activations (outputs of intermediate layers stored for backprop), and running forward pass in fp16/bf16 reduces VRAM by almost half (speeds up forward pass as well).

link

shikon7 903 days ago

I wonder about that too. With the small precision, parameter updates might be too small to have an effect (is it possible to use some sort of probabilistic update in that case?) Unfortunately, I haven’t found any resources describing the feasibility of full fp16 or bf16 training.

link

furiousteabag 903 days ago

You are correct, training sorely in fp16/bf16 can lead to imprecise weight updates or even gradients turning to zero. Because of that, mixed precision is used. In mixed precision training, we keep a copy of the weights in fp32 (master model) and the training loop looks like this: compute the output with the fp16 model, then the loss -> back-propagate the gradients in half-precision -> copy the gradients in fp32 precision -> do the update on the master model (in fp32 precision) -> copy the master model in the fp16 model. We also do loss scaling which means multiplying the output of the loss function by some scalar number before backprop (necessary in fp16 but not required in bf16).

Check out the fastai docs for more details: https://docs.fast.ai/callback.fp16.html

link

rdedev 903 days ago

Ah my bad. I am using mixed precision training in the my previous comment.

You might find this paper interesting: https://arxiv.org/pdf/2010.06192.pdf

link

bigdict 903 days ago

Hmm, what do you mean? I thought bf16 is used extensively for LLM training.

link

chrsig 903 days ago

How does one rawdog a 16-bit model?

link

kkzz99 903 days ago

Usually, for efficiency, you use quantized models. Quantized models reduce the number of bits available for each parameter, saving space and reduce RAM usage.

link