| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pavelstoev 417 days ago

Optimizing AI performance is like peeling an onion — every time you remove one bottleneck, another layer appears underneath. What looks like a compute problem turns out to be a memory bottleneck, which then turns out to be a scheduling issue, which reveals a parallelism mismatch… and so on.

It’s a process of continuous uncovering, and unless you have visibility across the whole stack — from kernel to cluster — you’ll spend all your time slicing through surface layers with lots of tears being shed.

Fortunately, there are software automation solutions to this.

1 comments

saagarjha 416 days ago

They’re not very good, unfortunately.

link