| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aukejw 407 days ago
	There are plenty of smaller (quantized) models that fit well on your machine! On a M4 with 24GB it’s already possible to comfortably run 8B quantized models. Im benchmarking runtime and memory usage for a few of them: https://aukejw.github.io/mlx_transformers_benchmark/