Hacker News new | ask | show | jobs
by lifthrasiir 4630 days ago
I always liked the fact that the word "thousand" itself is not among thousand most common English words.
1 comments

The same is true of, e.g., "one" and "two".

According to the frequency list at www.wordfrequency.info (the free top-5000 one; for more than that you need to pay them), the only numbers below 1000 for which all the words making up n are among the n most common English words are 800..808.

Proof: the ranks of numbers 1..9 are 50,79,134,250,299,425,735,743,1163; the ranks of 10..19 are 840,4318,3246,X,X,3019,X,X,X,X (where X means "bigger than 5000"); the ranks of 10,20,...,90 are 840,2103,2855,3767,3064,X,X,X,X; the rank of "hundred" is 621.

So: all number-words have rank >= 50 so nothing below 50 is "good". Anything from 20 to 99 has rank >= 2000 because of "twenty" etc., so nothing below 100 is "good". For the same reason nothing below 2000 whose tens digit isn't 0 or 1 is "good".

Nothing from 100 to 620 is "good" because of "hundred". Nor any higher 6xx, as above; nor 700..720 because of "seven"; nor higher 7xx, as above. The small 8xx are as stated; higher 8xx are no good as usual; no 9xx is good because of "nine".

This word list actually puts "thousand" in position 650, which makes lots of 1xxx numbers good: the little Common Lisp program that actually gave me the results above says it's these: 1000 1001 1002 1003 1004 1005 1006 1007 1008 1010 1100 1101 1102 1103 1104 1105 1106 1107 1108 1110 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910.

Of course all large enough numbers are "good". I can't tell where "large enough" starts because that wordlist doesn't go far enough to include even "thirteen" or "sixty". Perhaps somewhere around 10000 onwards?