| The post is good but just scratches the surface on running Kinesis Streams / Lambda at scale. Here are a few additional things I found while running Kinesis as a data ingestion pipeline: - Only write logs out that matter. Searching logs in cloudwatch is already a major PITA. Half the time I just scan the logs manually because search never returns. Also, the fewer println statements you have the quicker your function will be. - Lambda is cheap, reporting function metrics to cloudwatch from a lambda is not. Be very careful about using this.
- Having metrics from within your lambda is very helpful. We keep track of spout lag (delta of when event got to kineis and when it was read by the lambda), source lag (delta of when the event was emitted and when it was read by the lambda), number of events processed (were any dropped due to validation errors?). - Avoid using the kinesis auto scaler tool. In theory it's a great idea but in practice we found that scaling a stream with 60+ shards causes issues with api limits. (maybe this is fixed now...) - Have plenty of disk space on whatever is emitting logs. You don't want to run into the scenario where you can't push logs to kinesis (eg throttling) and they start filling up your disks. - Keep in mind that you have to balance our emitters, lambda, and your downstream targets. You don't want too few / too many shards. You don't want to have 100 lambda instances hitting a service with 10 events each invocation. - Lambda deployment tools are still young but find one that works for you. All of them have tradeoffs in how they are configured and how they deploy. There are some good tidbits in the Q&A section from my re:Invent talk [1]. Also, for anyone wanting to use lambda but not wanting to re-invent checkout Bender [2]. Note I'm the author. [1] https://www.youtube.com/watch?v=AaRawf9vcZ4
[2] https://github.com/Nextdoor/bender edit: formatting |
* e.g. "ETLLambda METRIC RECORDS_IN 948575"