|
|
|
|
|
by brrrrrm
1327 days ago
|
|
> tread off the beaten path things get slow Yea, makes sense. I think there's something to be said for dynamic compilation solving this problem more elegantly than providing tons of hand-tuned kernels (PyTorch is 890MB lmao https://pypi.org/project/torch/#files), but I don't think it's a strict reason for a performance win. > change the loop order too Memory layout as well! I'm 100% for dynamic compilation, but I'm claiming that it really finds its stride when you fuse things. |
|