|
|
|
|
|
by antirez
430 days ago
|
|
The system uses on purpose those simple words, since they are "tellers" of the style of the user in a context independent way. Burrows papers explain this very well, but in general we want to capture low-level structure, more than topics and exact non obvious words used. I tested the system with 10k words and removing the most common words, and you get totally different results (still useful, but not style matching), basically you get users grouped by interests. |
|
Yes, that's good! I didn't state my interest clearly, though. I'd like to see the "analyze" result with the stop words excluded, not for the style comparison part, but for the reasons you state and others.