Hacker News new | ask | show | jobs
by AlotOfReading 387 days ago
Only seems to have done that in a couple places, like the MatMul. The softmax kernel (https://github.com/ScalingIntelligence/good-kernels/blob/mai...) seem to be entirely bog-standard, and the layernorm kernels are only slightly more interesting.
1 comments

I looked at the softmax kernel and the cast that it does from a float* to a float4* is extremely brittle -- it's trivial to break by offsetting the input slightly.

Very likely a kernel for a standard library could not employ such a trick that relies on alignment of input pointers. Certainly not without a fallback.