Hacker News new | ask | show | jobs
by WithinReason 1264 days ago
Hmm, I think you might have to adjust the padding to be 128 bits then:

    __shared__ float As[(CHUNKSIZE+4) \* CHUNKSIZE]
Ultimately it's down to trial and error, like always with GPGPU.