Hacker News new | ask | show | jobs
by nathan_compton 122 days ago
I'm not sure, but I suspect that LLM weights don't compress all that well. The intuition here is that training an LLM is compression of the training data into the weights, so they are probably very information dense already. Can't squeeze them down much.
1 comments

I've found this to often be untrue when optimizing on the CPU. I wish someone would pay me to dive deep into this problem and the scheduling problem. I'd be amazed if I can't squeeze out a 50% speed increase on both problems.