|
|
|
|
|
by emidln
4183 days ago
|
|
Having done this a few times from scratch for various companies, there a ton of moving pieces for almost any processing pipeline. Being able to scale that pipeline without writing the ops code to make it happen is actually magical. I'm not saying everyone should jump out and use this, but it takes a lot of work to: (a) measure each of the points of your service
(b) deploy your code in an automated manner
(c) deploy your monitoring in an automated manner
(d) make sure your code is under supervision
(e) setup alerting on the monitoring
(f) scale up / down and within price constraints as needed
(g) repeat this for all supporting services (queue, db, etc)
(h) write your actual application code The potential to handle certain classes of problems via SQS/SNS/S3 pipelines is pretty alluring. You still have to do configuration, but the bet is that the configuration necessary for the SQS/SNS/S3/Lambda pipeline is far lower than that necessary to setup random autoscaling Celery, Resque, or random JMS/AMQP system on top of Ubuntu with Chef/Puppet/whatever. |
|
1. I agree that JMS sounds like a hassle but is that really necessary? I would think that you can batch process data on an EC2 instance, then pick it up in your local code directly using AWS APIs... not sure.
2. I am not so familiar with the Lambda system but I'm also not sure how it would scale db as necessary (item "g" in your list) thus overall processing time would still be bottlenecked by other resources (database IO, for example), no? I agree with your points but in all these cloud-compute scenarios I always wonder "Are we trying to reach a theoretical limit of fastest-possible computation, or just reach some reasonable saturation point close to the natural bottlenecks/throttles of our system integrations?".
3. Having been burned a few times now by over-optimizing when considering cloud I would probably now first consider just picking a slightly oversized EC2 instance and throwing some high-performing code onto it (Java, C++). Dynamic languages + auto-scalable resources (though I'm talking about web hosting in particular now) seems to drain clients wallets more than anything. At this point I'd actually recommend anyone with new web infrastructure to just buy a static instance and write optimized Java rather than trying their hand at auto-scaling Ruby/Python/Node. Do you notice a similar issue with your clients regarding code optimization vs. auto-scaling?