| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by operator-name 815 days ago
	If I've understood this correctly, the test is to measure the saftey finetune performance. These commercial models have been finetuned so that they are "safe", and safe models should not blindly quote what they are told. Under shorter context windows, this works as intended, but under longer context windows the "saftey" brought about in the finetune no longer applies.

1 comments

Bingo!