|
|
|
|
|
by moultano
3252 days ago
|
|
Lots of reasonable hacks. 1. Use only the beginning of the document, as that's probably the most important part anyways, and it's fast. 2. Divide the sum of your feature scores by sqrt(n) to give it constant variance, and hopefully keep it comparable with your prior. 3. Split the doc into reasonably sized chunks, and average their scores rather than adding them. |
|
That seems to be a solution devised for news articles, as the standard news writing style involves providing answers to the Five Ws up front on the article.