Hacker News new | ask | show | jobs
by samsartor 409 days ago
The absolutely dumbest way is to compare the number of parameters with your bytes of RAM. If you have 2 or more bytes of RAM for every parameter you can generally run the model easily (eg 3B model with 8GB of RAM). 1 byte per parameter and it is still possible, but starts to get tricky.

Of course, there are lots of factors that can change the RAM usage: quantization, context size, KV cache. And this says nothing about whether the model will respond quickly enough to be pleasant to use.