Hacker News new | ask | show | jobs
by wopwopwop 3402 days ago
Question to the experts here:

- What is the relevance of Docker here? I'm pretty sure that celery+rabbitmq are enough to do a distributed scraper...

4 comments

I think the OP just drank the docker kool-aid :) It's also the future, obvs. https://circleci.com/blog/its-the-future/

> and learn how to use docker and celery

Seems the OP was learning Docker at the time? I think it just comes down to the tools you're comfortable with.

I use Celery inside Docker, mostly for the lazy-ops advantages; makes it very simple to bring up new pools of Celery workers, shut them all down, and mix projects on the same host to maximize its utilization.

Generally I'm getting fond of containers as a mechanism to encapsulate deployments e.g. in Python which have a lot requirements and which I've found finicky to make portable.

Full disclosure: I do something even worse, have containers which pull updates when I like with deploy keys, and run Celery etc in a virtualenv in the container... :P

The latter feels truly shameful but it does make it easy to keep the project contained even when running outside a container...

Not relevant at all.

It was just shoehorned.

The crux of a project such as this is maintaining a connection pool and managing it efficiently.

Also respecting robots.txt which the author barely mentions.

This is a "tool looking for a problem" kind of post.

also docker makes it trivial to link a bunch of swarm hosts together, scaling this across multiple machines would basically be free as he added them to the swarm.