Hacker News new | ask | show | jobs
by marginalia_nu 1263 days ago
You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

On top of that, mechanical hard drives are pretty cheap these days. Like it's a dozen dollars per terabyte, if not less.

I don't know, you're either producing just absurd amounts of logs, on the order of a hundred gigabytes a day plain text, at which sure, I guess you could probably log a bit less. Either that or you're operating at a scale with many millions of users where you should have income and be able to afford it.

... well, either that, or you're being fleeced.

3 comments

> You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

Now count that for queryable data source so running a database of some sort (Elasticsearch probably for logs) 24/7 at fast enough speeds that it is ops-useful

Metrics are significantly cheaper tho, at least if you use some dedicated TSDB with good storage engine like Victoriametrics or influxdb.

VictoriaMetrics author here. I'm working on VictoriaLogs right now, e.g. the logging system on top of VictoriaMetrics architecture ideas. Preliminary results are promising:

- It will need much lower amounts of disk space, disk IO, CPU and RAM comparing to ElasticSearch during data ingestion.

- It will provide fast logs' querying and tailing via easy-to-use query language (LogsQL), with the ability to calculate advanced stats over the selected logs.

- It will accept data in ElasticSearch format, so existing Filebeat and Logstash setups can be switched from ElasticSearch to VictoriaLogs in a few seconds.

Any thoughts about OpenTelemetry ? That covers both logs, tracing and metrics
> You actually have to log a damn lot to actually fill up even a single 16 Tb drive with gzip-compressed logs which typically have something like 50x compression for log data.

If you cannot search quickly in the logs, at least the "hot" ones (i.e. most recent) they don't make too much sense. Well, they still make sense but for other reasons, but you lose many interesting feature of logs. At $DAYJOB we *surely* needs to trim and shave a lot the logs apps are sending to the centralized ELK - which is one of the points of TFA - but we cannot just gzip the text files and be done, we need to be able to search for patterns anbd data in the logs to understand what the app is doing in certain cases (besides having metrics).

P.S. We also store them as gzipped files in an S3 bucket using warm/cold tiers, and it is certainly cheaper than using even magnetic disks.

When people complain about the cost of excessive logging, they are almost certainly not thinking in terms of how much a drive costs.

Services like CloudWatch are an excellent way to burn through money, though it's usually the time series storage and ingestion costs that balloon out of control.

Well, also the kind of people who worry about this are not thinking in terms of "a terabyte", like GP. It's always easy to give advice when your experience has been at a toy level.
That's unnecessarily dismissive. Handling many (or dozens, hundreds) TB worth of logs is anything but "toy level", that's more than the vast majority of businesses will generate in a decade, maybe even their lifetime.
And marginalia_nu, the GP I was referring to, was unnecessarily strident, concluding that others must be naïve or incompetent if they had to handle logs with "ELK or something like that" and that therefore one must have an "over-complicated distributed software design."

Don't move the goalposts to hundreds of TB--this user is giving advice to everyone based on a perspective that you're doing something wrong if all of your logs don't fit on a single hard drive; that you should "log less" if you have the "absurd" quantity of "hundreds of gigabytes" a day of logs, and who seems to think individual hard drive costs is an important driver of the cost of managing logs. Their words, not mine.

There's nothing interesting to be gained from hot takes based on naïve conceptions and lack of experience. Pointing out that giving overly-general advice based on your inexperienced best guesses and the NewEgg price list is not very useful is not "unnecessarily dismissive."

And if you're in a position where you have to manage petabytes+ of logs using on-prem hardware, SSDs are probably a small part of your overall budget!
Also true, though it's not zero. The different cost drivers/cost model between using SaaS and on-prem infrastructure for logs are interesting and drive different decisions. I have done both in large and small environments and I kind of like the SaaS model because it is easier to put cost incentives on product owners and development teams, which is who should own the P&L. On other words, if you pay $1/GB or whatever, you can get back money by logging fewer GBs. It naturally discourages "log whatever into a giant undifferentiated bucket 'in case you need it'".

You can pay less for equivalent on-prem infrastructure but it drives costs quite differently. For example, it tends to be hard to refresh that infrastructure because it doesn't make you money, so it gets worse over time. The unit cost of storage is very low, often because you make availability/durability tradeoffs that aren't even available to you from the SaaS provider or cloud service. But you will find that the Opex associated with it can be either quite high or poor, and this is hard to reflect in terms of investment by P&L owners.

You can do either approach well or poorly. The way SaaS sucks when doing it poorly is mostly that you are paying a huge amount of money. The way on-prem sucks when doing it poorly is much more complicated and is reflected by toil and tech debt across an organization (which is money but harder to tie to what would fix it), poor visibility, lack of insights, and possibly spending too much in Opex or licenses, depending on the technology. The cost of having a bunch of people do on-prem logging "right" is hard to justify, even for these "large organizations" where I guess, people think, money is free. And even if you've correctly identified and wish to fund the cost of delivering the infrastructure (which as you point out has hardware as only part of its cost), it's not like you can necessarily find the five quality engineers to run the thing. And if you could--do you really want these FTEs working on logging infrastructure or do you want them delivering revenue features?