Hacker News new | ask | show | jobs
by wyldfire 3922 days ago
Is this similar to the concept used by Amazon's "statistically improbable phrases" (word-based instead of n-gram based)?

EDIT: according to SO, yes: http://stackoverflow.com/a/2009546/489590

1 comments

Yes! Although ...

"Wait a minute. Strike that. Reverse it. Thank you."

TF-IDF is old, and very cool. n-gram based extensions of it are a bit newer, but are likely implemented in almost exactly the same way. N-grams just require a lot more compute power because your corpus grows faster than a plain ol' bag of words.