Hacker News new | ask | show | jobs
by creativeSlumber 854 days ago
>The outage originated from our partner AWS and took down Wyze devices for several hours early Friday morning. ... As we worked to bring cameras back online, we experienced a security issue. Some users reported seeing the wrong thumbnails and Event Videos in their Events tab. ... The incident was caused by a third-party caching client library that was recently integrated into our system. This client library received unprecedented load conditions caused by devices coming back online all at once. As a result of increased demand, it mixed up device ID and user ID mapping and connected some data to incorrect accounts.

As an software engineer who's dealt with caches for large high throughput services, this does not make sense to me why they are blaming a caching client. It's your own code that will decide what is the cache key, and what value to pass as the cache key. Did the caching library have a bug where when you ask for a given key, it returned results for a different key? Or more likely did your own code have a bug where you mixed up the keys? I think we need more details on what went wrong in here.

1 comments

Lot of blaming others - for architecture, stack, configuration, and operational choices that are likey/should be own decisions that should come with taking ownership.
But they've built dashboards! They probably even bought products with a single pane of glass!