|
|
|
|
|
by d4mr
149 days ago
|
|
I got mass nerd-sniped by Anthropic's performance engineering challenge this week and fell down the rabbit hole. This is Part 1 of a series breaking down the problem from first principles, what the simulated machine actually does, why instruction packing is hard, and where the optimization opportunities are.
Built some interactive visualizations to make the concepts click (SIMD vs scalar, dependency graphs, the gather bottleneck).
Currently at 2,375 cycles on the leaderboard. Planning to go deeper into tinygrad's approach in Part 2, their linearizer gets 1,340 with a "fairly generic backend" which is wild. |
|