Hacker News new | ask | show | jobs
by nailer 4500 days ago
'Linux might get DTrace-like capabilities via SystemTap'

Might? SystemTap's been stable & out of the box on RHEL 5 for years. The extensive list of default tap points are supported on production by RH. When I was doing systems stuff 5 years ago we used it for double checking TCP socket options that weren't normally exposed to userspace tools.

Dtrace can do some awesome stuff stap can't, like trace something from userspace down to kernel space. But systemtap does provide live instrumentation on running kernels and has for years.

PS: Just saw p93. Sun got in trouble for the obvious "Setting up SystemTap is difficult on distros that don't maintain, support, use or like SystemTap" years ago. This is repeated on p 93. Stop it, this makes Joyent look stupid, you're better than this.

And p100. Re: 'several million dollar E10K' performance? Red Hat were replacing them with Xeon's by 2003 (with Sun's old sales staff) and crushing them. Though that was SPARC's fault not Solaris's.

5 comments

It would be great if all Linux distributions had SystemTap out-of-the-box like RHEL. Or something with a sufficient feature set to solve problems (which may become ktap, or perf_events, or lttng). However, most of the Linux performance issues I'm debugging are on Ubuntu systems. So the question remains: when will Linux-at-large get this capability?

Red Hat can certainly say that SystemTap is obviously a priority on their OS, and they can only make best effort for others. I said this in the talk. Indeed, a working SystemTap should be an incentive to run RHEL. It is a compelling argument -- the benefits of a working SystemTap are enormous (which I also explained in the talk). Sun made basically the same argument for switching to Solaris: a working DTrace.

Although, I'm not sure that using SystemTap is entirely safe for production yet, which I said was the most important priority. I notice that this related kernel panic is still not fixed:

http://sourceware.org/bugzilla/show_bug.cgi?id=2725

Although some recent and promising progress has happened in the past few months, as noted by the bug! I only care about this bug because I sometimes trace large subsets of the kernel (eg, all functions in a module or driver), and I think I've hit the same related issue. I don't actually need to trace everything.

You said that DTrace can trace from userspace to kernel, which SystemTap can't. I'm not sure if there's a specific case you're referring to, but SystemTap nowadays can indeed trace userspace to kernel (via uprobes), and a lot of work has been happening to get SystemTap userspace tapsets to work.

As for setting up SystemTap being difficult on distros that don't support or maintain SystemTap. Well, almost all of my customers run Ubuntu. If they hit perf issues, do I convince them to switch to RHEL? (The approach that Sun made for DTrace.) Or is this a problem to be taking up with the other Linux distro maintainers? Which goes back to my original point: when will Linux get this? :-)

Is it problematic to install SystemTap on Ubuntu? On Debian systems it's straightforward.

apt-get install systemtap linux-image-amd64-dbg

It's not too much harder, but requires a non-default respository, so mildly harder. Quasi-official instructions here: https://wiki.ubuntu.com/Kernel/Systemtap

The difference is that, as of the past few releases, Ubuntu doesn't ship a -dbg kernel package in the main repository like Debian does. Instead they've come up with a new system to ship 'debug symbol' variants of all packages, with a -dbgsym suffix. But they ship them in a separate repository that isn't enabled by default, I guess to avoid confusing end-users (ddebs.ubuntu.com). So you need to configure that repository, install its key, then pull linux-image-$(uname -r)-dbgsym. I guess you might also want to pull the debug-symbol versions of the main libraries as well.

(Disclosure: I work at Joyent.)

Are you accusing Brendan of misrepresenting SystemTap on slide 93? If so, could you be specific as to what that misrepresentation is? Brendan has a ton of experience earnestly using SystemTap (or rather, trying to use it), and it's hard to see how sharing that experience makes either Brendan or his employer "look stupid." For more details on Brendan's explorations of SystemTap:

  http://dtrace.org/blogs/brendan/2011/10/15/using-systemtap/
That's several years old at this point, but the architectural limitations highlighted by Brendan haven't changed.
Yes, I am accusing the author of deliberately misrepresenting facts. I'm not sure what's vague, let me recap:

1. 'might' implies the future. As you can tell from the post you're replying to, Systemtap is present thing. I think I actually put most of the tapsets shown on the Wikipedia page around 2009, and I was using them then.

2. Also, as the author partially acknowledges in the next slide, Ubuntu is the dumbest place to illustrate a typical Systemtap experience. I suspect that's deliberate, rather than ignorance as Sun tried that a few years back and I don't think anyone with any interest in the matter is ignorant enough not to remember the shitstorm.

As I've written elsewhere: if your customers mostly run Ubuntu, then test dtrace on Ubuntu too. It'd give you just as accurate impression about what the dtrace 'experience' is as installing SystemTap on Ubuntu.

Edit: actually looks like Joyent wrote the shitstorm post for a few years ago, you've still got it up, you still don't see anything wrong, and you just linked to it.

christ read the HN comments from 2011. https://news.ycombinator.com/item?id=3118416

Edit 2: you're Bryan Cantrill. You didn't seem to care much about performance a while ago: http://cryptnet.net/mirrors/texts/kissedagirl.html

Edit 3: more Cantrill classiness http://www.quora.com/Node-js/Why-did-Ben-Noordhuis-decide-to...

Sorry to anyone who thinks I've gotten personal, but this guy is a well known... unsavoury character.

Sorry, which facts are you accusing him of misrepresenting? Brendan is pretty upfront with what SystemTap is and isn't; which part of his in-depth blog entry on SystemTap[1] do you disagree with? And I get that you're upset that he used Ubuntu, but he's also upfront about why (namely, that's what our customers use); that seems to me to be more of a fact that you don't like rather than a misrepresentation.

I'm also a little confused about your first edit; which "shitstorm" post are you referring to? As far as I can tell, his SystemTap blog entry did not attract a shitstorm -- unless you count you taking issue with it under two different account names on HN[2].

Finally, as for your second edit: given that you've already praised my work, I'll happily accept you assailing me personally. (Or rather, I certainly prefer that to having you praise me personally, but condemn my work...)

[1] http://dtrace.org/blogs/brendan/2011/10/15/using-systemtap/

[2] https://news.ycombinator.com/item?id=3118416, posting as both "nailer" and "mikemaccana"

I have stated which parts are incorrect, and why, very clearly, twice. Stop asking.

Yep, I have two HN accounts from clickpass/openid being shut down, I've emailed YC quite a while ago to sort it out. Hopefully they'll get around to it at some point. The accounts posted in different threads - if I'm astroturfing, I'm not doing it very well. Two other accounts also point out holes in the post.

We're both aware you're well known as a fairly unethical individual, you already know what's false, and you're not interested in correcting anything. I think I've engaged with you enough.

Um, wow. I'm "well known as a fairly unethical individual"?! I mean, I know I'm being trolled here, but there's a level of accusation that I simply can't leave unchallenged. Like any human, I'm certainly loaded with faults; I can be brash and opinionated and intimidating and abrasive -- but I also consider myself an intensely ethical person who feels a profound sense of responsibility to my family, my team, my company, my community and my craft. So at the risk of feeding the trolls, can you be specific about why you think me to be unethical?

Edit: Okay, looking at the extensive edits above, it's clear that claiming I'm "fairly unethical" was a reference -- at least in part -- to the pronoun fracas. You may disagree with my position on that (though it's a position I stand by, for whatever it's worth), but I would refer you to cooler discussion on it[1] in hopes that you might at least better understand my perspective.

[1] https://gist.github.com/trevnorris/7729322

Yeah, you are unethical. If you had been judged by Sun in the way that you judged Ben Noordhuis then you wouldn't have a job.

I don't think you ever apologized for your "ever kissed a girl" comment? Now you are some sort of bastion of moral integrity that threatens to fire someone you don't employ?

Wow, I wish I could flag posts and not only threads. Hell, I wish I could flag users. You have no place here.

/edit: when I posted this, the post above was full of insults.

Sorry, this guy really gets on my nerves, I hope you can understand why. I've edited my post to use better language.
You can flag posts, just click on the link that says "link".
Demonstrating on Ubuntu was indeed deliberate -- it's because most of my customers, and most of the performance issues I'm debugging, are on Ubuntu.
For the sake of consistency, then, I suggest you install dtrace on Ubuntu.
I thought that was sort of his point, that you can't? I didn't read it as a DTrace vs. Systemtap discussion, but as an observability-on-SmartOS versus observability-on-Ubuntu discussion. If Ubuntu shipped with DTrace (or it could be easily installed), there'd be no real observability difference.

It's a good point that this is a SmartOS vs. Ubuntu rather than SmartOS vs. all-Linuxes comparison, though.

Running dtrace on Ubuntu is possible: much like SystemTap, it's unsupported and involves lots of messing around.

If someone wrote an article called 'Using dtrace' and wrote about the poor experience they had on Ubuntu it'd be just as (in)valid.

I did, and I've used it to solve issues on Ubuntu, reproduced in the lab. It's not very featured yet, and it can panic and freeze the kernel. That's why I said "Not safe for production use yet".
That post is from 1996. Although it's not great to dismiss a technical post with a single line of snark, is it really relevant to question someone's character based on a post from 18 years ago?
When did he apologize for that comment?

Is it appropriate to judge someone's character because he didn't use gender neutral language? Is it even appropriate to threaten to fire someone you don't employ - that's sort of like trying to prevent the guy from having a job - ever!

> Ubuntu is the dumbest place to illustrate a typical Systemtap experience

I'm not sure what to make of this. Aren't Systemtap, dtrace etc. supposed to be generic _Linux_ kernel and userspace tools that are not distribution-specific? If so, then Brendan's opinion is fair.

If not, then sure, some clarification is needed (i.e., substitute Ubuntu for Linux). And I'll add it to my ever-growing list of why production sites shouldn't be running Ubuntu. :)

> Stop it, this makes Joyent look stupid, you're better than this.

How does it make joyent look stupid? Dtrace is miles ahead of Systemtap, especially when it comes to installation because dtrace is generally embraced where it's being used as a default installation which systemtap is not and I don't have to make a monkey dance if I upgrade the kernel.

On a system with dtrace like OS X tracing tools like dtruss are basically just wrappers around dtrace. It's very natural to the platform.

Systemtap was out of the box on RHEL 5 update 2. No monkey dance involved.
And yet, I've seen it crash systems as late as RHEL6u2. It certainly made me very cautious about using it on production systems, which was sort of the whole point.
> And p100. Re: 'several million dollar E10K' performance? Red Hat were replacing them with Xeon's by 2003 (with Sun's old sales staff) and crushing them. Though that was SPARC's fault not Solaris's

Possibly I missed the golden era of Sun field support, or perhaps it was more a regional thing, but I consistently get vastly superior support from Red Hat than I've ever had from Sun, even when running M-class Sun kit.

And let us not speak of Oracle.

I've played with systemtap around 2011/2012, with RHEL 5 and versions of Fedora at the time, and I could easily get it to kernel panic. I went on to their irc channel to report the problems and got some fixes, but in the end it still wasn't stable and I gave up due to lack of time.

I really wanted to get it working. One of my colleagues (who was from Sun) actually left and went to work for Joyent, presumably at least partly because of the frustration with the lack of a dtrace like tool.