| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by microtonal 457 days ago
	solving a small subset of problems in a way noone asked for What do you mean? Having ROCm fused MoE and MLA kernels as a counterpart to kernels for CUDA is very useful. AMD needs to provide this if they want to keep AMD accelerators competitive with new models.

1 comments

fock 457 days ago

should the matrix-multiplication at the core of this not be in a core library? Why are generic layers intermixed with LLM-specific kernels when the generic layers are duplicating functionality in torch?

Upstreaming that might actually help researchers doing new stuff vs. the narrow demographic of people speeding LLMs on MI300X's.

link