| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tucnak 148 days ago

I'm not convinced. (A bit of advice: if you wish to make a statement about performance, always start by measuring things. Then when somebody asks you for proof/data, you would already have it.) If what you're saying were true, it would be a big deal, except unfortunately it isn't.

Dispatch has overheads, but it's largely insignificant. Where it otherwise would be significant:

1. Fused kernels exist

2. CUDA graphs (and other forms of work-submission pipelining) exist

1 comments

saagarjha 148 days ago

CUDA graphs are pretty slow at synchronizing things.

link