Hacker News new | ask | show | jobs
by aPoCoMiLogin 796 days ago
where you worked doesn't matter to me very much, when what are you saying contradicts what you probably did ("experience in large scale systems"), also it sounds like argument from authority.

not having cpu/mem/hdd metrics is just plain bogus and sounds like fantasy world, where everything works like we expect it to work, and there is no bugs at all. ridiculous

2 comments

You question his competence.

> i feel like you have never touched servers/backend in anything more than simple projects (or at all). with full storage/memory there could be an issue that you won't be able to ssh to the server, so it speaks about your knowledge in this matter.

He was answering that.

If instead of dismissing someone outright and question their competence, you had raised specific concerns, this would have been a more productive conversation

> You question his competence. > He was answering that. > If instead of dismissing someone outright and question their competence, you had raised specific concerns, this would have been a more productive conversation

he first said that we don't need to monitor anything, just enable debugging when "business metrics" are failing, and then he changed his stance to "polling from time to time". that's just shows that his first take wasn't thoughtful, so I assumed that he never worked in "the field" or worked on smaller projects, as nobody that worked in bigger projects would say that "we don't need CPU/mem/hdd metrics". it's not like hes proposing something novel, that just ridiculous take that needs to be called out

> i feel like you have never touched servers/backend in anything more than simple projects (or at all)

I feel like if you are going to go out on a limb and call someone's expertise into question...

> I ran all of ops for reddit for four years and headed up SRE at Netflix

And they provide excellent credentials which you failed to check...

> where you worked doesn't matter to me very much

You can't just weasel out of it by pretending like you didn't start the interaction by calling someone's expertise into question.

> And they provide excellent credentials which you failed to check...

that's logical fallacy, you can work in any place on earth and still be wrong in the subject.

> You can't just weasel out of it by pretending like you didn't start the interaction by calling someone's expertise into question.

why? if his take is bad, then his job or experience doesn't change the outcome. i'm not an expert by any means, but things that hes saying just contradict everything that is standard practice and my own experience. based on that i'm able to say that he doesn't know what he's saying/proposing, and using his "excellent credentials" just make things worse, as it shows that he doesn't have an argument, just wishful thinking

At the scale of Netflix or Reddit it very well may make sense to only keep very limited CPU/memory stats on such a massive fleet. Look, I have a different opinion as well, but the difference between you and me is I'm not resorting to personal attacks and instead discussing it on the merits.
>At the scale of Netflix or Reddit it very well may make sense to only keep very limited CPU/memory stats on such a massive fleet.

read again what is his argument, that we don't need to store __any__ cpu/mem/storage metrics, other than "business metrics" (or later he crawled back to polling from time to time).

> Look, I have a different opinion as well, but the difference between you and me is I'm not resorting to personal attacks and instead discussing it on the merits.

maybe that's due to difference in culture/region, but i'm unaware where i've attacked him personally. i've just pointed out that what he's saying is to be expected by someone without experience/knowledge "in the field".

I read again his argument.

> Keeping all of your application logs and telemetry forever is expensive, and I can't recall a single time when having more than a day's with of history was ever useful in tracking down an operational issue.

That doesn't say don't store any, that says you can get by storing a 24 hour period. And his broader point is that it should be time bound, that storing these metrics indefinitely isn't useful and can be very expensive.

I'm of the mind that a week or two of fast online access is the right amount myself (with offline "cold" storage of logs for a longer period), but the overall premise that storing logs and infrastructure metrics forever is unnecessary and wasteful.