| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stlava 2899 days ago

The post is good but just scratches the surface on running Kinesis Streams / Lambda at scale. Here are a few additional things I found while running Kinesis as a data ingestion pipeline:

- Only write logs out that matter. Searching logs in cloudwatch is already a major PITA. Half the time I just scan the logs manually because search never returns. Also, the fewer println statements you have the quicker your function will be.

- Lambda is cheap, reporting function metrics to cloudwatch from a lambda is not. Be very careful about using this. - Having metrics from within your lambda is very helpful. We keep track of spout lag (delta of when event got to kineis and when it was read by the lambda), source lag (delta of when the event was emitted and when it was read by the lambda), number of events processed (were any dropped due to validation errors?).

- Avoid using the kinesis auto scaler tool. In theory it's a great idea but in practice we found that scaling a stream with 60+ shards causes issues with api limits. (maybe this is fixed now...)

- Have plenty of disk space on whatever is emitting logs. You don't want to run into the scenario where you can't push logs to kinesis (eg throttling) and they start filling up your disks.

- Keep in mind that you have to balance our emitters, lambda, and your downstream targets. You don't want too few / too many shards. You don't want to have 100 lambda instances hitting a service with 10 events each invocation.

- Lambda deployment tools are still young but find one that works for you. All of them have tradeoffs in how they are configured and how they deploy.

There are some good tidbits in the Q&A section from my re:Invent talk [1]. Also, for anyone wanting to use lambda but not wanting to re-invent checkout Bender [2]. Note I'm the author.

[1] https://www.youtube.com/watch?v=AaRawf9vcZ4 [2] https://github.com/Nextdoor/bender

edit: formatting

2 comments

djhworld 2899 days ago

For metrics we find writing a well defined, formatted message to stdout* and then using Cloudwatch Logs Metric filters works pretty well.

* e.g. "ETLLambda METRIC RECORDS_IN 948575"

link

phillipwills 2899 days ago

I used the aws sdk to write a simple NodeJS script to pull logs for a date range, then I can grep the output...

link

bogaczio 2899 days ago

I've heard that they'll be releasing a cheap Splunk-like service over CloudWatch for log searching, which will hopefully alleviate the issue.

link

nikhilkuria 2897 days ago

There was this one time when we had trouble finding some errors on Cloudwatch logs. This one library was helpful, https://github.com/jorgebastida/awslogs

link

jdc0589 2899 days ago

I hope so. it's almost useless as is

link