| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by paxys 799 days ago
	It could have been trained on this exact picture created by a fan and uploaded to some forum. Ultimately it is impossible to know unless testing with brand new material. I have the same problem with benchmarks that use real world tests (like SAT/LSAT/GRE or whatever else). The model got a good score, sure, but how many thousands of variations of this exact test was it trained on? How many questions did it encounter that were similar or the same?