| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jmanhype 108 days ago

I'm not an ML engineer. I used Claude Code (Opus 4.6) to get LoRA fine-tuning gradients running on Apple's Neural Engine — the dedicated ML chip in every Apple Silicon Mac that has no public training API.

192 gradient dispatches, zero GPU fallbacks, converging loss, all at ~2.8W.

Three discoveries found through iteration on real hardware: (1) ANE's matmul op compiles but never executes — everything must be rewritten as 1x1 convolutions, (2) spatial dimensions must be multiples of 16, (3) the ANE compiler leaks handles and silently fails after ~119 compiles.

Built on maderix's ANE reverse engineering work. The repo includes the full MIL kernel generator, subprocess isolation for the compile limit, and integration with MLX for hybrid GPU+ANE training.