|
|
|
|
|
by binux
4236 days ago
|
|
I'm working on a benchmarking suite https://gist.github.com/binux/67b276c51e988f8e2c31 and meet some problem... pyspider comes from a vertical search engine project. we have two issues: - 100+ websites, they may change the template or down sometime.
We need a dashboard to monitor the changes and the fails. - update in 5 minutes, when the website updated, we need follow that in 5 minutes.
We are using a update time from index(list) page to tell the changed pages.
And pages should been updated after about 30 days in case of we missed something.
A powerful scheduler is needed. obviously, I hadn't got the right way to do so with scrapy. I'm not very familiar with scrapy. So I can't say something pyspider can do but scrapy not. |
|