Hacker News new | ask | show | jobs
by ot 337 days ago
There is an additional benefit to throttling by time, it is a lot easier to do it efficiently in multithreaded environments.

If you log by count, you need a global counter for that event (you could do thread-local, but then your logging volume would depend on the number of threads). If the code path is hot (which may be the case if you want to throttle your logs) multiple threads will contend on the increment, and that can be very expensive.

If you log by time, you just need a load and a clock read (on Linux, `CLOCK_MONOTONIC_COARSE` is a handful of ns and the resolution is enough for this purpose), and only need synchronization (a compare-and-swap) when the timer expires, so threads virtually never interfere with each other.

2 comments

For high-concurrency scenarios, sharded counters (per-thread counters with occasional global sync) or probabilistic logging (log with probability 1/N) can also solve the contention issue while maintaining count-based semantics. These approaches give you deterministic volume without the CAS overhead when timers expire.
That still means each thread will do its own separate log call every second (or whatever the period is) instead of all threads aggregating into a single log call.
No, the timer is still global (that's why you need the compare-and-swap). But the threads only need to do reads most of the time, and reads do not cause contention. Writes do.

It looks something like this (pseudocode):

    static std::atomic<uint64_t> deadline{0};
    auto now = coarse_clock::now();
    auto curDeadline = deadline.load(std::memory_order_relaxed);
    if (now >= curDeadline &&
        deadline.compare_exchange_strong(curDeadline, now + period, std::memory_order_relaxed)) {
      // Actually log
    }
Your “actually log” is within one thread. Either the threads all do their separate “actually log”, or they have to synchronize their data into a single shared “actually log”.
> Either the threads all do their separate “actually log”

But why? Often the purpose is just to log a "been here" signal with some additional details for diagnostics. You don't need to include an accumulation of everything that happened since the last log. All that you care about is that the log happens at most 1/period, say once per second.

If you do want to also log some data that accumulates everything that happened, you can accumulate the data in thread-local buffers, and in the "actually log" part collect all the buffers and log them. Since this only happens in the thread that "wins" the CAS, it is still very scalable. This is a very common technique.

If you throttle by count, you cannot avoid the contended atomic increment (you can with some sophistication and at the cost of some approximation).

Well, yeah, it depends on what you want to log.

> collect all the buffers

Which requires some sort of synchronization (or lock-free data structures), because of concurrent writes by other threads. In that situation, you can also simply use a dedicated thread to periodically flush the log buffers.

Yes, but crucially, only 1/period, not on every single "should log?" call, which is what I was referring to above. The per-thread mutexes are uncontended virtually all the time.