| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PeterisP 1134 days ago
	One problem is the big disbalance in resource requirements for pretraining large foundational models and finetuning them for specific tasks. Currently, the foundational models have no concept of "prompt", that's only added in later finetuning, and by that stage it is too late to mess around with different architectural features to implement out-of-band signaling, as the architecture is fixed. If we'd want it to learn to handle out-of-band data, then we'd need to figure out how to handle that during the initial unsupervised pretraining on unlabeled text, otherwise it will simply learn to ignore all those prompt-related features.