| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bionhoward 626 days ago
	Yes, I believe this is possible, you could clone weights of one or more existing models and fine tune them in groups with different random seeds for noise/drop to produce reasonable outputs under a differential transformer decoding scheme whereby tokens with disagreement receive more attention (surprisal analysis)