Hacker News new | ask | show | jobs
by RhodesianHunter 2059 days ago
Even with batching from Tempo, wouldn't that cost many thousands per month in S3 PUT costs alone?
2 comments

We batch up traces in a block and write a block at a time. Internally we are currently configured to write 100k traces in one batch.
Doesn't this cause explosive memory usage? What happens if there's some congestion? Is there a circuit breaker to start dumping (discarding) log entries past a certain limit?

I was testing Google Pub/Sub's Go client for publishing internal API event data for later ingest to BigQuery, and it turns out Pub/Sub publishing is not that much faster than writing directly to BigQuery. The buffer sizes we'd need to avoid adding latency to our APIs would have to be ridiculously high; the Pub/Sub client buffers and submits batches in the background (its default buffer size is 100MB!). I don't like the idea of having huge buffers that increase with the request rate.

Conversely, pushing the data to NATS in recent time without any buffering or batching turned out to be fast enough to not add any latency. You have to be able to receive messages very fast on the consumer side (as NATS will start dropping messages if consumers can't keep up), but you can simply run a few big horizontally autoscaled ingest processes that can sit there ingesting as fast as they can, which never impacts API latency at all.

S3 PUTs are $0.005 per 1000. If you're writing twice a second that comes out to $25/month.
Yeah, but the person I'm responding to is suggesting 170k+ spans per second so how is twice per second relevant?