Hacker News new | ask | show | jobs
by visarga 4784 days ago
Just apply TFIDF to text and it extracts the most interesting words out of the phrase - it's dead simple. You just count words and do a little scoring and sorting. Example applied to tweets. Check out how the least significant words come out last. Some words have been dropped (those with frequency less than 5 in a corpus of a few million phrases).

------

- "math final today 6-17-09 piece of cake hopefully i should do well since i m a math nerd amp english amp social"

- math, nerd, studies, piece, cake, english, amp, final, hopefully, social, since, should, well, today, do, of,

------

- "anyone want an incredibly designed unique limited edition tee for the summer check out www artcotic com"

- tee, designed, incredibly, unique, edition, limited, summer, anyone, check, www, want, com, an, out, for, the,