Hacker News new | ask | show | jobs
by NeckBeardPrince 1263 days ago
You know that costs money, right?
1 comments

Not as much as you'd think, and critically, the cost is largely disconnected from how you use the infra.
If your infra is not on-prem, yes it will cost you more money as you are generating more and more and bigger logs.
You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

On top of that, mechanical hard drives are pretty cheap these days. Like it's a dozen dollars per terabyte, if not less.

I don't know, you're either producing just absurd amounts of logs, on the order of a hundred gigabytes a day plain text, at which sure, I guess you could probably log a bit less. Either that or you're operating at a scale with many millions of users where you should have income and be able to afford it.

... well, either that, or you're being fleeced.

> You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

Now count that for queryable data source so running a database of some sort (Elasticsearch probably for logs) 24/7 at fast enough speeds that it is ops-useful

Metrics are significantly cheaper tho, at least if you use some dedicated TSDB with good storage engine like Victoriametrics or influxdb.

VictoriaMetrics author here. I'm working on VictoriaLogs right now, e.g. the logging system on top of VictoriaMetrics architecture ideas. Preliminary results are promising:

- It will need much lower amounts of disk space, disk IO, CPU and RAM comparing to ElasticSearch during data ingestion.

- It will provide fast logs' querying and tailing via easy-to-use query language (LogsQL), with the ability to calculate advanced stats over the selected logs.

- It will accept data in ElasticSearch format, so existing Filebeat and Logstash setups can be switched from ElasticSearch to VictoriaLogs in a few seconds.

Any thoughts about OpenTelemetry ? That covers both logs, tracing and metrics
> You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

If you cannot search quickly in the logs, at least the "hot" ones (i.e. most recent) they don't make too much sense. Well, they still make sense but for other reasons, but you lose many interesting feature of logs. At $DAYJOB we *surely* needs to trim and shave a lot the logs apps are sending to the centralized ELK - which is one of the points of TFA - but we cannot just gzip the text files and be done, we need to be able to search for patterns anbd data in the logs to understand what the app is doing in certain cases (besides having metrics).

P.S. We also store them as gzipped files in an S3 bucket using warm/cold tiers, and it is certainly cheaper than using even magnetic disks.

When people complain about the cost of excessive logging, they are almost certainly not thinking in terms of how much a drive costs.

Services like CloudWatch are an excellent way to burn through money, though it's usually the time series storage and ingestion costs that balloon out of control.

Well, also the kind of people who worry about this are not thinking in terms of "a terabyte", like GP. It's always easy to give advice when your experience has been at a toy level.
That's unnecessarily dismissive. Handling many (or dozens, hundreds) TB worth of logs is anything but "toy level", that's more than the vast majority of businesses will generate in a decade, maybe even their lifetime.
And if you're in a position where you have to manage petabytes+ of logs using on-prem hardware, SSDs are probably a small part of your overall budget!