|
|
|
|
|
by abhgh
654 days ago
|
|
As others have mentioned here you might get better results cheaper (this probably wasn't the point of the article, so just fyi) if you preprocess the html first. I personally have had good results with trafilatura[1], which I don't see mentioned yet. [1] https://trafilatura.readthedocs.io/en/latest/ |
|