| HN Mirror

// reduce: attempt to consume tokens from bucket, // starting with one_time_burst, overflowing to // budget. // (keep comment only if _expensive_):Performs // auto-replenish when // burst is emptied, and budget has insufficient // tokens. // Asserts size, but only on replenishment. // On OverConsumption, entire bucket is emptied. // On Failure, only one_time_burst is emptied. // In all cases, tokens is reduced by amount // fulfilled.

BucketReduction reduce(long tokens) { long fromBurst = min(tokens, burst); burst -= fromBurst; tokens -= fromBurst; if (tokens > budget) { auto_replenish(); } long fromBudget = min(tokens, budget); budget -= fromBudget; tokens -= fromBudget; long totalConsumed = fromBurst + fromBudget; if (tokens == 0) { return new BucketReduction(Success, totalConsumed); } if (totalConsumed + tokens > size) { return new BucketReduction(OverConsumption, totalConsumed); } return new BucketReduction(Failure, totalConsumed); }

This is a great comment: instead of making generic arguments, you actually tried to show how to do it better. Thank you.

I don't find the comments in the original code distracting, but I do like your version better.

> I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.

This behavior is documented in the public API [0], so whatever is the reason why it was chosen, I don't think it can ever be changed.

> I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.

My understanding is this. If the number of tokens requested is greater than the remaining budget but less than the size of the bucket, the call is rejected and the caller is blocked until it has enough tokens. But if the number of requested tokens is greater than the size of the bucket, the caller will never have enough tokens. Instead of blocking the caller forever, the rate limiter lets the call go through, but then blocks the caller for a while to compensate for the over-consumption. Here's the handling [1]. I wish it was documented better.

[0] https://github.com/firecracker-microvm/firecracker/blob/fc2e...

[1] https://github.com/firecracker-microvm/firecracker/blob/2f92...