Hacker News new | ask | show | jobs
by mnutt 701 days ago
It sounds nice, but also consider: 1) depending on how your app crashes, are you sure the buffer will be flushed, and 2) if logging is expensive from a performance perspective, your base performance profile may be operating under the assumption that you’re humming along not logging anything. Some errors may beget more errors and have a snowball effect.
1 comments

Both solved by having a sidecar (think of as a local ingestion point) that records everything (no waiting for flush on error), and then does tail sampling on the spans where status is non OK - i.e. everything thats non OK gets sent to Datadog, Baselime, your Grafana setup, your custom Clickhouse 100PB storage nodes. Or take your pick of any of 1000+ OpenTelemetry compatible providers. https://opentelemetry.io/docs/concepts/sampling/#tail-sampli...

Pattern is the ~same.