| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sunpazed 409 days ago
	The key benefit is significant lower power usage. Benchmarked llama3.2-1B on my machines; M1 Max (47t/s, ~1.8 watts), M4 Pro (62t/s, ~2.8 watts). The GPU is twice as fast (even faster on the Max), but draws much more power (~20 watts) vs the ANE. Also the ANE models are limited to 512 tokens of context, so unlikely yet to use these in production.

1 comments

We can ran 2000 or 4000 context with ANE