Crawlab is a golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Technically you can run any spider on it. It has both English and Chinese language support.
Since its launch in March, Crawlab has received a lot of positive feedbacks, especially about the flexibility and appealing Web UI. And Crawlab is evolving fast, we have developed many features through continuous iterations.
Scrapy is a web crawler. This project is a web crawler management UI/platform, so it presumably manages your scrapy crawlers/instances and schedules them.
So if I understand it correct Crawlab is another simple easy to use admin for managing web crawlers, one still needs to use scrapy or write their own crawlers. It should be similar to the admin tool I mentioned in my earlier comment and at:
Certainly interesting to see English's domination increasingly challenged on open source tech projects. However this makes contributing harder for non-Chinese speakers. I had a look at the git's issues page and all the discussion is in Chinese. Google translate can help, but I'm not sure it would be enough for some subtle problems. Also not sure how communication would go with PRs if part of the team is strictly sinophone.
Great project nonetheless. Will likely give it a try. Keep up the excellent work!
I really appreciate having a "lingua franca" of programming. Projects in other languages are certainly interesting to see, but I also appreciate that most authors use English, it contributes to a larger worldwide community.
Thanks for the feedback. Actually I saw a lot of great Chinese projects on Github trending and sadly they are Chinese only. I would definitely agree they can do better by translating into English!
Thanks all for the upvoting and positive feedbacks for Crawlab. The reason why Crawlab is mainly focused on Chinese is because it was initially promoted in mainland China tech sites (Juejin, V2ex, etc). Due to the GFW we cannot access the info outside China, therefore it would be difficult for us to know the feedback from non-Chinese developers.
We definitely would be happy if more contributors can join Crawlab development, so we will be working on the improvement of multi-language support including English documentation, Code of conduct, Contributing.md and English communities. Our team is small (please check out the Contributors section) but from top companies in China and we would be happy to share knowledge between Chinese and non-Chinese developers.
Btw, what is the best tech community? (In China we have Wechat group)
Thanks @atymic for the feedback. The initial password for admin is changed so that no harmful action would be done on the demo. Instead, you can still sign-up to checkout the demo.
And we do have an English version but not on the Login page. Will definitely add into it.
No, we are using SubPub for message communication between nodes. For tasks, we are using hashed list. English documentation missing but we will add it later.
Thanks for the upvotes.
Crawlab is a golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Technically you can run any spider on it. It has both English and Chinese language support.
Github Repo: https://github.com/tikazyq/crawlab Demo: http://crawlab.cn/demo
Since its launch in March, Crawlab has received a lot of positive feedbacks, especially about the flexibility and appealing Web UI. And Crawlab is evolving fast, we have developed many features through continuous iterations.