|
|
|
|
|
by ashwing_2005
4456 days ago
|
|
This is great. However I have one bone to pick(or rather know if its been taken care of)
Scrapy uses xpaths or equivalent representations to scrape. However there are many alternate xpaths to represent the same div.
For e.g. Suppose data is to be extracted from the fifth div in a sequence of divs. So it would use that as the xpath. But now say it also has a meaningful class or id attribute. An xpath based on this attribute might be a better choice because this content may not be in the fifth div across all the pages in a site I want to scrape.
Is this taken care of by taking the common denominator from many sample pages? |
|