Hacker News new | ask | show | jobs
Ask HN : Suggestions for Writing Background jobs
3 points by mabid 5399 days ago
I need to implement a few background jobs, that run continuously in the background and do a lot of http calls to some API's to gather some data and store it in mysql. I would like your suggestions and comments on what will be the best way, architecture, language to use so that the system is scalable.

I will need it to be threaded so that i can pull data simultaneously.

What language should I use.

1) Ruby 2) Scala 3) Java 5) PHP

I know Ruby PHP and Java well.

Thanks

3 comments

RabbitMQ (or any AMQP broker) sounds like it would do the job, since it sounds like you need to send very little data (a URL) to the background worker. I believe that there are some good AMQP/Rabbit gems for Ruby, but I'm a little dated on that. The basic premise is that a broker (RabbitMQ-Server, etc.) holds onto a queue of messages, that are distributed to N subscribers when they ask for them. The more subscribers you have to the broker, the more jobs can happen at the same time.

However, RabbitMQ might be too much overhead for what you want to do. With Ruby, specifically, I've had a good deal of experience with Resque, which uses Redis (key-value store) as a queueing system, much like RabbitMQ. It's easy to set up, and gets the job done just as well.

If you know java and scala, I would recommend you to give Play framework a try. It is extremely simple and fast to set up and develop with, plus it uses Quartz scheduler library for the jobs.

http://www.playframework.org/documentation/1.2.3/jobs

Alternatively you could simply use Quartz alone.

http://www.quartz-scheduler.org

And for the rest of your needs, Play gots you covered too...

You can use play's WS class to make asynchronous HTTP calls from your server, and it integrates nicely with MySQL or your database of choice too. I think it uses Hibernate under the hoods, but leverages all its power while simplifying configuration and usage.

You can use Play with Scala, and if you do so, there is a nice database layer called Anorm.

I see Play is web dev framework. As far as my system is concerned I dont need to have a full web app. The system just needs to sit in the background read the database and then request 2-3 API's for data and put that back in the db. I expect a lot of writes to the database. I am reading about Play's support for jobs. Do you still think Play is the way to go ?
I see your point. Play is a framework based in the pattern MVC. If you just don't provide any View layer, you can use the Controller to access the Model, and use the rest of the goodies Play gives you for free, like Jobs, the WS class for easy HTTP calls, a RESTFUL interface, etc.

As I said, you can always use just Quartz for the jobs (that's what Play uses anyway), and create your own Data Access Layer or use an ORM, or what you like.

It would be polite to put to least a hint of your question in the title so people know what it's about.
I noticed that after i posted. I have following HN for over a year but this was my first post so thats why...