| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by microtonal 325 days ago
	Until we got highly optimized decoder implementations, decoders for prefill were often even implemented by using the same implementation as an encoder, but logit-masking inputs using a causal mask before the attention softmax so that tokens could not attend to future tokens.