Hacker News new | ask | show | jobs
by mlochbaum 359 days ago
And the reason +˝ is fairly fast for long rows, despite that page claiming no optimizations, is that ˝ is defined to split its argument into cells, e.g. rows of a matrix, and apply + with those as arguments. So + is able to apply its ordinary vectorization, while it can't in some other situations where it's applied element-wise. This still doesn't make great use of cache and I do have some special code working for floats that does much better with a tiling pattern, but I wanted to improve +˝ for integers along with it and haven't finished those (widening on overflow is complicated).