Hacker News new | ask | show | jobs
by liuliu 1313 days ago
Should be doable for parameters but at that point, you don't need compression rather just LLM.Int8 tricks would be sufficient. For activations, I wrote about it a while back: https://liuliu.me/eyes/reduce-another-70-memory-usage-for-de...

It is not as useful for this case (inference) because the activations holds long (UNet holds downsampling passes' activations and use that for upsampling) is not that much of a memory (in the range of a few megabytes). If it is for training, it is probably more useful.