Hacker News new | ask | show | jobs
by malisper 2315 days ago
Although there are a ton of AWS servers, there's only a few core services that I recommend:

  EC2 - You need a server.
  RDS - You need a database.
  S3 - You need to store files.
  Lambda - You are building an API with short lived requests.
These services are all very high quality and are excellent at what they do. Once you get outside of these core services, the quality quickly drops. You're probably better off using the non-AWS versions of those services.

For a few quick examples, you should be using Datadog over CloudWatch, Snowflake over Redshift or Athena, and Terraform over CloudFormation.

4 comments

Why would you ever use Terraform over CloudFormation? There are so many parts of AWS that use CF and that you can modify from the getting started templates like CodeStar and exporting a SAM template from your lambda template.

Before someone comments on how TF is “cross platform”, all of the provisioners are vendor specific.

As far as what other services to use, if you are hosting your own services on AWS instead of using AWS manager services, you’re kind of missing the point of AWS.

But a few other services we use all of the time are CodeBuild, ElasticCache (hosted Redis), ElasticSearch, Route 53, load balancers, autoscaling groups, SSM (managing the few “pets” until we can kill them), ECS, ECR, Fargate, SNS, SQS, DynamoDB, SFTP, CloudTrail, Microsoft AD, we are experimenting with the recently announced Device Farm/Selenium service, step functions, Athena, Secrets Manager, and a few more I’m probably forgetting.

> Why would you ever use Terraform over CloudFormation?

1. You're using Terraform already for resources outside of AWS (cdn, monitoring, dns, anything else) and want to stay with a common tech.

2. You're running into cases that CF doesn't support and have to generate your descriptions externally, or use sparkleformation hacks.

3. You want to manage a new AWS service. (CloudFormation support lags behind Terraform, new services don't get CF resources for months)

In cases two and three it’s just as easy to write a custom resource....
You mean just as easy to write/test/deploy a custom resource as it is to use a ready one? I disagree. I think there's a few days of work of difference in that case.
Actually, no.

Examples for creating them in Java, Python and Node are here

https://github.com/stelligent/cloudformation-custom-resource...

Just add a few lines of code for create, update and delete for your resource.

For Node and Python, you can write them in the web console, test them, copy the code to your git repo and export the SAM CF template for your CI/CD process.

Depending on the market segment you exist in, I'd recommend AWS Fargate and AWS Lightsail (container-runner; Digital Ocean/Linode/VPS competitor) over EC2. There's absolutely a segment for which EC2 is appropriate, but just like most data isn't "big", I doubt that most EC2 customers wouldn't be better served by Lightsail. If you've got several hundred or several thousand EC2 instances with bespoke code/config for many different ASGs, then Lightsail isn't for you, but (my impression is) that's not most people.
> you should be using Datadog over CloudWatch

DataDog is great, but the way it polls data means you can't rely on it being available for a long time: https://docs.datadoghq.com/integrations/faq/cloud-metric-del...

> If you receive 1-minute metrics with CloudWatch, then their availability delay is about 2 minutes—so total latency to view your metrics may be ~10-12 minutes.

If an alert delayed by 10min matters to you, DD is not viable for alerting (could be still used for dashboards).

CloudWatch Logs has a lot of its own internal latencies. If you can send the logs straight from your ec2 hosts to the log processing system and bypass CloudWatch, then you only want/need CWL for the things you can’t get from those logs.

CloudWatch Metrics is a totally separate beast, which happens to share a similar name. You can set up basic alerts in CWM, and you can trigger certain types of events from those alerts, but it is still very limited. If you want real monitoring and alerting, then CWM isn’t even the easy 80%.

Agreed, my message was about using DD instead of CW for internal logs. With your own logs, you've got so much more flexibility - but not everything can be done that way. For example ELB stats don't really exist elsewhere in realtime.
Just curious, why would you not recommend SQS ?
I've never used SQS but IMO it seems inferior to Kinesis or Kafka. The two big reasons are that you can't have multiple consumers read from a single queue and once data leaves the queue, it's gone forever. Both Kinesis and Kafka let you have multiple consumers and configure a retention period for your messages.
You can have millions of consumers read concurrently from a single SQS queue. Messages that are read remain in the queue up to the configured retention period or until a consumer calls DeleteMessage.

Source: I’ve built very high volume services that continue to run production workloads and use SQS as the buffer between components.

> You can have millions of consumers read concurrently from a single SQS queue.

We're using different definitions of "consumer". By consumer, I'm talking about a group of workers that processes the data for one purpose. For example you may have one consumer read from the queue to generate various metrics and a second consumer read from the queue and write to a DB. With vanilla SQS, when you process a message, you need to perform all the tasks simultaneously. With Kinesis and Kafka you can have independent groups of workers (i.e. independent consumers), each performing one of these tasks. Each consumer is able to process the queue at it's own rate. The way Amazon recommends doing this in SQS is to have SNS fan out a single SQS queue to multiple SQS queues. Then you can consume each queue independently[0]. That will multiply your costs by the number of queues you have.

> Messages that are read remain in the queue up to the configured retention period or until a consumer calls DeleteMessage.

I'm talking about retaining a message even if it was successfully processed, on the order of days or weeks. I've used this feature of Kafka before to implement a recovery log. Under normal operation, Kafka writes data to a DB. If the DB goes down, you can quickly recover the last N days of data by going through the data retained in Kafka.

[0] https://forums.aws.amazon.com/message.jspa?messageID=865925

One producer/multiple consumers is what SNS+ Attributes + subscription filters + SQS is for.

If your database goes down, you have point in time recovery and read replicas that can be promoted as needed.

How so? I've run thousands of consumers on SQS for batch jobs and it seems to work.

There's also dead letter queue and retries for messages that aren't properly serviced.