Hacker News new | ask | show | jobs
by stlava 2899 days ago
The post is good but just scratches the surface on running Kinesis Streams / Lambda at scale. Here are a few additional things I found while running Kinesis as a data ingestion pipeline:

- Only write logs out that matter. Searching logs in cloudwatch is already a major PITA. Half the time I just scan the logs manually because search never returns. Also, the fewer println statements you have the quicker your function will be.

- Lambda is cheap, reporting function metrics to cloudwatch from a lambda is not. Be very careful about using this. - Having metrics from within your lambda is very helpful. We keep track of spout lag (delta of when event got to kineis and when it was read by the lambda), source lag (delta of when the event was emitted and when it was read by the lambda), number of events processed (were any dropped due to validation errors?).

- Avoid using the kinesis auto scaler tool. In theory it's a great idea but in practice we found that scaling a stream with 60+ shards causes issues with api limits. (maybe this is fixed now...)

- Have plenty of disk space on whatever is emitting logs. You don't want to run into the scenario where you can't push logs to kinesis (eg throttling) and they start filling up your disks.

- Keep in mind that you have to balance our emitters, lambda, and your downstream targets. You don't want too few / too many shards. You don't want to have 100 lambda instances hitting a service with 10 events each invocation.

- Lambda deployment tools are still young but find one that works for you. All of them have tradeoffs in how they are configured and how they deploy.

There are some good tidbits in the Q&A section from my re:Invent talk [1]. Also, for anyone wanting to use lambda but not wanting to re-invent checkout Bender [2]. Note I'm the author.

[1] https://www.youtube.com/watch?v=AaRawf9vcZ4 [2] https://github.com/Nextdoor/bender

edit: formatting

2 comments

For metrics we find writing a well defined, formatted message to stdout* and then using Cloudwatch Logs Metric filters works pretty well.

* e.g. "ETLLambda METRIC RECORDS_IN 948575"

I used the aws sdk to write a simple NodeJS script to pull logs for a date range, then I can grep the output...
I've heard that they'll be releasing a cheap Splunk-like service over CloudWatch for log searching, which will hopefully alleviate the issue.
There was this one time when we had trouble finding some errors on Cloudwatch logs. This one library was helpful, https://github.com/jorgebastida/awslogs
I hope so. it's almost useless as is