|
|
|
|
|
by hnuser123456
438 days ago
|
|
I think it does?: (the comment is in the original source) print("Adding matrices using GPU...")
start_time = time.time()
gpu_result = add_matrices(gpu_matrices)
cp.cuda.get_current_stream().synchronize() # Not 100% sure what this does
elapsed_time = time.time() - start_time
I was going to ask, any CUDA professionals who want to give a crash course on what us python guys will need to know? |
|
So if you need to wait for an op to finish, you need to `synchronize` as shown above.
`get_current_stream` because the queue mentioned above is actually called stream in cuda.
If you want to run many independent ops concurrently, you can use several streams.
Benchmarking is one use case for synchronize. Another would be if you let's say run two independent ops in different streams and need to combine their results.
Btw, if you work with pytorch, when ops are run on gpu, they are launched in background. If you want to bench torch models on gpu, they also provide a sync api.