| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by moultano 5804 days ago
	Just for completeness, you guys should compute g-test statistics for this so that the statisticians see something they're used to. http://en.wikipedia.org/wiki/G-test

1 comments

mshron 5804 days ago

I'll check it out. Thanks!

link

moultano 5804 days ago

I think it should actually give you better results. It's monotonic in kl divergence, and does a much better job of taking into account how common the feature is rather than just how different it is. You no longer need to do things like throwing out phrases that appear less than x times if you use it.

link