| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Don_Patrick 2147 days ago
	This is a very interesting test. It seems to me that, aside from the prompt bias, it may be detecting "yo be real" category questions by the low statistical probability of the words occurring in sequence. Those would be especially low for gibberish words that had an occurrence of zero in its data, which is why it's so good at spotting those. One could also try and detect nonsense with quad-grams using the same principle, but in any case this metric would tend to mark unique questions and foreign names or brandnames as nonsense.