Hacker News new | ask | show | jobs
by d110af5ccf 643 days ago
> the reason the magnitude doesn't matter is that those counts will be much higher in longer documents ...

To be a bit more explicit (of my intuition). The vector is encoding a ratio, isn't it? You want to treat 3:2, 6:4, 12:8, ... as equivalent in this case; normalization does exactly that.