| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by liuliu 1358 days ago
	CPU offloading doesn't work because Apple has shared memory arch already. The head slicing is similar to https://machinelearning.apple.com/research/neural-engine-tra... I think it is quite practical only if MPSGraph less mysterious about its allocation strategy. It is not the ideal way though. Ideally, FlashAttention / XFormer is the way to go.