| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by irusensei 113 days ago
	I noticed that even on my M3 MLX tends to do prefill it a lot faster than llama.cpp and GGML models. Anyone knows how they do it?