|
|
|
|
|
by davidatbu
280 days ago
|
|
Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc. |
|
Somehow python managed to be both high level and low level language for GPUs…