Hacker News new | ask | show | jobs
by wodenokoto 1755 days ago
What would be the correct way of going about assessing statistical significance of these frequencies?

Like if we assumed that all English language is generated from a weighted distribution of all words and “the” is 3.5%, is a 4.3% occurrence rate even significant? (And what even would be the base occurrence rate?)