Hacker News new | ask | show | jobs
by SleepyMyroslav 406 days ago
While novel it also very far removed from hardware. In sense that aggregating what actually going on with work submitted from multiple queues is hard. Even gathering timing events for start stop of each can be confusing and not adequate when GPUs execute more than one shader at the same time. That's not to say its not useful I just dont really trust aggregates even on multithreaded CPU if I can't go check raw events.
1 comments

It's not using timing-based aggregation. The EU stall samples from hardware include the instruction pointer, which links them to the shaders mapped in the GPU's address space.