Hacker News new | ask | show | jobs
by mhh__ 2006 days ago
The value of SIMD on a CPU these days is really a middle-ground where you value latency above throughput, so you probably would have the same trade off as getting the data to and from a GPU
1 comments

i thought the M1 was a SoC with a unified memory architecture?
That definitely changes the calculus, but as I've mentioned in a different comment there doesn't seem to be literally any microarchitectural documentation to read, so I (don't own an M1) have nothing to go off unfortunately.

I'll make a wild guess that getting data to the neural engine is still probably not quick because I assume it's some kind of statically scheduled type affair (exposed pipeline?). We literally seem to know almost nothing about it sadly.

https://jobs.apple.com/en-gb/details/200205070/neural-engine...

Even the job listing gives away next to nothing, other than poor English "Knowledge in compiler is a plus" ;)