|
|
|
|
|
by gus_massa
3990 days ago
|
|
Some blog have standard end paragraph like "If you have read all of this, you may like to subscribe to my rss", or "We are always hiring at ABC, send your resume." Another problem are short captions that look like a paragraph for the html parser, like "Advertisment" or "XYZ Benchmark (higher is better)". One possible solution is to skip the paragraphs that have less than ¿150? letters. |
|