Hacker News new | ask | show | jobs
by ActivePattern 95 days ago
The win is in how many weights you process per instruction and how much data you load.

So it's not that individual ops are faster — it's that the packed representation lets each instruction do more useful work, and you're moving far less data from memory to do it.