|
|
|
|
|
by sunpazed
409 days ago
|
|
The key benefit is significant lower power usage. Benchmarked llama3.2-1B on my machines; M1 Max (47t/s, ~1.8 watts), M4 Pro (62t/s, ~2.8 watts). The GPU is twice as fast (even faster on the Max), but draws much more power (~20 watts) vs the ANE. Also the ANE models are limited to 512 tokens of context, so unlikely yet to use these in production. |
|