Hacker News new | ask | show | jobs
by algo_trader 1546 days ago
Is it difficult/desirable to squeeze/compress an open-sourced 200B parameter model to fit into 40GB?

Are these techniques for specific architectures or can they be made generic ?

2 comments

I think it depends what downstream task you're trying to do... DeepMind tried distilling big language models into smaller ones (think 7B -> 1B) but it didn't work too well... it definitely lost a lot of quality (for general language modeling) relative to the original model.

See the paper here, Figure A28: https://kstatic.googleusercontent.com/files/b068c6c0e64d6f93...

But if your downstream task is simple, like sequence classification, then it may be possible to compress the model without losing much quality.