Hacker News new | ask | show | jobs
by marmaduke 2073 days ago
> From a Python programmer perspective, how does CUDA.jl compare to PyCUDA?

I think the relevant comparison today is with Numba, here's a real world recurrence analysis,

    @cuda.jit
    def _sseij(Y, I, J, O):
        # strides
        sty = cuda.blockDim.x
        sbx = sty * cuda.blockDim.y
        sby = sbx * cuda.gridDim.x
        sbz = sby * cuda.gridDim.y
        # this thread's index
        t = (cuda.threadIdx.x
           + cuda.threadIdx.y * sty
           + cuda.blockIdx.x * sbx
           + cuda.blockIdx.y * sby
           + cuda.blockIdx.z * sbz)
        if t < I.size:
            i = I[t]
            j = J[t]
            x = nb.float32(0.0)
            for k in range(Y.shape[0]):
                x += (Y[k, i] - Y[k, j])**2
            O[t] = math.sqrt(x)
most of it is index calculation, but super easy
1 comments

Does numbacuda allow passing buffers between kernels now? Or does it still pretty much require you to write a single superkernel in a buggy, difficult-to-debug language subset?

C isn't usually a breath of fresh air, but the last two times I tried to use nubacuda and failed (~1.5 year ago), it sure felt that way.

EDIT: yes, looks like it supports on-device buffers now. I'm still in "once bitten, twice shy" mode on account of the bugs and debug story, but I'm cautiously optimistic.