Hacker News new | ask | show | jobs
by showdeddd 1109 days ago
This medium article seems silly, just add custom metrics to your app for what was fetched from cache vs DB. And label the metric by route/query/pattern. To control costs, don't tick the metric for every single request, instead accumulate locally and post to metrics API every X minutes.
2 comments

That didn't exist back then. Adding logging all over the application would increase costs while losing a significant amount of money.

The problem is that Google had no problem billing for a metric that they didn't expose to the user and didn't provide tools to debug properly.

1. No it would not be expensive. Writing metrics is literally free: https://cloud.google.com/stackdriver/pricing . If metrics didn't exist yet then log 1/1000 requests to control the log volume and find the pattern from the averages.

2. It isn't up to google to tell you if you are querying against the cache or the DB. It's your code. You should know. Just tick something when you use the Redis/BQ/GCS/SQL client.

This task is so easy I would assign it to a junior engineer and expect code changes done in one day!

1. Logs are not free. Stackdriver was not an option back then.

2. If the phone company charges me extra they tell me which numbers I called. Here a cache miss started happening. Only in production with no tooling available (at the time) to determine why this was happening. A single number of "data read" was all the information given. Not even the table name... That means you end up looking for a needle in a haystack.

I'm guessing you work for Google because your attitude seems similar. No they don't *have* to provide that service which is exactly why they suck. A service oriented company would make the *effort* to provide a user with this information. Especially a paying user at gold level.

> accumulate locally

Is that an option in AppEngine? The memcache docs seem to indicate the free tier has undocumented eviction policies.

You have to implement that with your own code but it isn't much more than a dict/map and a timestamp for last update.
Same-ish problem, though. You wouldn't know for sure the instance will run again...your dict/map data can be dropped. I don't see any sort of instance timeout callback where you could guard against that.
I think it's negligible. The only metrics you lose are on rare scaledowns and they are averaged out anyways. GCP likes to keep instances idle for a long time.