| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samsartor 460 days ago
	The absolutely dumbest way is to compare the number of parameters with your bytes of RAM. If you have 2 or more bytes of RAM for every parameter you can generally run the model easily (eg 3B model with 8GB of RAM). 1 byte per parameter and it is still possible, but starts to get tricky. Of course, there are lots of factors that can change the RAM usage: quantization, context size, KV cache. And this says nothing about whether the model will respond quickly enough to be pleasant to use.