Y
Hacker News
new
|
ask
|
show
|
jobs
by
WithinReason
1264 days ago
Hmm, I
think
you might have to adjust the padding to be 128 bits then:
__shared__ float As[(CHUNKSIZE+4) \* CHUNKSIZE]
Ultimately it's down to trial and error, like always with GPGPU.