Hacker News new | ask | show | jobs
by magnat 657 days ago
Because you can quantize a model e.g. from original 16 bits down to 5 bits per weight to fit your available memory constraints.