| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by technovangelist 1088 days ago
	I am using ollama today on a MacBook Pro M1Max with 64GB. Using a llama2 70b model, I am getting about 7 tokens/second with the onboard gpu. Before ollama used gpu, that was much slower. To compare, the 7b model gets me closer to 55 tokens/second. There is no way it could achieve those numbers without the gpu.