Hacker News new | ask | show | jobs
by Const-me 590 days ago
Sorry, I have not benchmarked against cuBLAS or Eigen or similar, I did that thing for ML inference.

I have implemented a profiler on top of D3D11_QUERY_TIMESTAMP and D3D11_QUERY_TIMESTAMP_DISJOINT queries, and tweaked the compute shader to minimize the time reported by these queries for my specific use case.