Hacker News new | ask | show | jobs
by pavelstoev 417 days ago
Optimizing AI performance is like peeling an onion — every time you remove one bottleneck, another layer appears underneath. What looks like a compute problem turns out to be a memory bottleneck, which then turns out to be a scheduling issue, which reveals a parallelism mismatch… and so on.

It’s a process of continuous uncovering, and unless you have visibility across the whole stack — from kernel to cluster — you’ll spend all your time slicing through surface layers with lots of tears being shed.

Fortunately, there are software automation solutions to this.

1 comments

They’re not very good, unfortunately.