|
|
|
|
|
by escanor
1296 days ago
|
|
great work!
just a note regarding tf-idf, when you mention log10:
i think you're missing the point on the reason of log and most importantly base 10.
namely, using log10 gives us a perspective on the number of digits of the term/document frequency.
if a term "A" occurs 23 times and a term "B" occurs 50, they will have a very close representation (because both numbers are 2 digits ones). anyway, thanks for the submission |
|