|
|
|
|
|
by Const-me
590 days ago
|
|
Sorry, I have not benchmarked against cuBLAS or Eigen or similar, I did that thing for ML inference. I have implemented a profiler on top of D3D11_QUERY_TIMESTAMP and D3D11_QUERY_TIMESTAMP_DISJOINT queries, and tweaked the compute shader to minimize the time reported by these queries for my specific use case. |
|