Hacker News new | ask | show | jobs
by scarface74 2317 days ago
Why should an instance created by an ASG have a host name? These are cattle not pets. I use Serilog for logging with an EC2 enricher that automatically adds the instance Id and the IP address.

Since Serilog does structured logging, I can use either an ElasticSearch or Mongo sink and do complex queries.

If I routinely need to log into an instance to troubleshoot, I need to be capturing data and sending it to a central logging system.

2 comments

> Why should an instance created by an ASG have a host name?

It means you can connect to it by just knowing its instance ID.

Adding the IP address everywhere also works.

There can be some nice SSH config options though, like using a particular key for everything *.prod.myaws.com

You can also use SSM and session manager to get a similar experience for those instances: https://docs.aws.amazon.com/systems-manager/latest/userguide...

I haven't had to manage SSH keys in a long time ;)

With this I just have a bash function for my various environments (e.g. dev = dssm) where I provide in the instance ID giving me issues if I really need to log into the server.

e.g.

function dssm { aws --region us-west-2 --profile my-dev-profile-name ssm start-session --target $1 }

Then:

dssm i-abcdef123456

And I'm dropped into a shell. SSM Session manager is far from perfect, but it gets the job done, and is fully auditable, gets logged (including commands ran), and best of all works with SAML IAM profiles right out the gate. No more sharing keys, no more managing keys, it's great!

Yes exactly, SSH access is also one of the reasons for building the module, that is mentioned in the blog post.

> Access: When troubleshooting, we save time not having to look up the instance’s internal IP address for SSH access.

That’s the second part. If I’m troubleshooting by logging into EC2 instances, there is something wrong with my logging infrastructure. That’s actually the larger issue.
Post author here.

SSH access is absolutely a last resort, but can be necessary in certain cases (like when Filebeat breaks...). Turning SSH off completely (i.e. "No SSH") is certainly better for security and something we may pursue.

I mentioned in another comment here that SSH is just one example, we can also easily hit endpoints with curl via hostname.

Also mentioned in the post are other tools (like Grafana dashboards) have an expectation of unique hostnames.

> If I’m troubleshooting by logging into EC2 instances, there is something wrong with my logging infrastructure.

I suppose it's possible to build enough logging to account for an interactive SSH session for debugging problems...but that would be massive.

I ran out of disk space. Why?

If you’re logging to a local disk on ephemeral VMs, that doesn’t make the situation any better.

That’s why you need a central logging facility. If you’re using AWS, you could store your structured JSON logs in S3 and query them with Athena. (https://medium.com/quiq-blog/store-json-logs-on-s3-for-searc...)

Of course there are other ways both using AWS and third party services. Centralized logging is a solved problem.

AWS isn’t going to run out of disk space any time soon. You could also use a lifecycle policy to delete old logs or move them to a lower cost storage depending on your retention policy.

I’m not saying that I have never had to log on to a VM to troubleshoot, but that’s a sign of the need of better logging.

And if my logging infrastructure isn’t good, how pray tell will I troubleshoot my programs running on Lambda or Fargate?

I never said your disk was full with logs.

> how pray tell will I troubleshoot my programs running on Lambda or Fargate?

That is indeed a big problem running on Lambda and Fargate.

In my experience, Fargate isn't very commonly used and Lambda is used for only relatively simple things.

It’s not a problem at all with lambda or Fargate. Logging can be as simple as printing to the console and they go to CloudWatch.

It’s the same concept. If you’re troubleshooting at any point involves needing to log in to an EC2 instance, you might as well have a few bespoke servers called “Web01” and “Web02”. You’re just using ASG to create pets at scale. We run an ASG in production that scales from 2 to 30 instances based on the number of messages in a queue, lambdas running all of the time, some a Fargate tasks etc. it would be a nightmare to troubleshoot all of those processes without centralized, queryable logs.

In my experience, Fargate isn't very commonly used and Lambda is used for only relatively simple things.

And that experience is representative of the entire AWS ecosystem?

I agree, I wouldn't want it any other way nowadays, but back then I had to migrate a lot of legacy system to AWS under pressure.

For one part we had a legacy service needing to connect to the services in the ASG and the best way to implement it was with round-robin DNS. So the lambda would update a DNS record contianing all the ASG host ips.

Also, because we had some had some semi stateful legacy instances that where basically lift and shift to AWS, but I wanted to have them in ASG to keep our environment similar until we could refactor them into real cattle.

Just out of curiosity, why not just put the ASG behind a load balancer?
I don't remember exactly. We did use elb's for all other services. So it was either cost or it had to do with MX record restrictions in that you're not allowed to use CNAMEs in MX records.