Hacker News new | ask | show | jobs
by jsolson 1431 days ago
I think I can count the number of kernel changes I've submitted on one hand, but I work on core virtualization that involves a lot of pretending to be hardware and (these days) a lot of poking directly at hardware registers.

I would say James Mickens sums things up nicely in "The Night Watch[0]." For example, you mention debugging with logs and metrics -- this snippet came to mind:

     “Yeah, that sounds bad. Have you checked the log files for
     errors?” I said, “Indeed, I would do that if I hadn’t broken every
     component that a logging system needs to log data. I have a
     network file system, and I have broken the network, and I have
     broken the file system, and my machines crash when I make
     eye contact with them. I HAVE NO TOOLS BECAUSE I’VE
     DESTROYED MY TOOLS WITH MY TOOLS. My only logging
     option is to hire monks to transcribe the subjective experience
     of watching my machines die as I weep tears of blood.”
Mind you, I absolutely _love_ working on low-level stuff, and I wouldn't trade the time I get to spend actually doing that for anything. That said, the complexity of modern operating systems, CPU architectures, interconnects, and peripherals creates opportunities for frustration and confusion that honor no bounds of reasonability or decency.

[0]: https://www.usenix.org/system/files/1311_05-08_mickens.pdf

8 comments

As an aside: James Mickens is a treasure. If anybody reading this hasn't read any of his articles or watched any of his talks before stop now and do it. You won't be sorry.

https://mickens.seas.harvard.edu/wisdom-james-mickens

I did as you suggested. I was not sorry. Thank you!
He is also an incredible professor.
Reading this man's homepage he comes across as a insufferable egotist. I couldn't find any trace of didactic value among the self aggrandizing rhetoric not possible to convey with infinitely greater humility and persuasion as well as concision.

Edit: reference source decades of being quite intolerable myself. Takes one to know one. And by my experience incredible good fortune and more decades to repent.

The page is supposed to read as a joke. That's his schtick. I think his writing is pretty funny.
Haha, I love this humor, it's the kind I write in a frenzy of enthusiasm and inspiration often fueled by a sudden realization of the crazy state some things are in.

I also have gotten a lot slack for it. But I see it as a way to convey that new feeling of wonder and awe a child has upon entering a new world. Challenge the assumptions, make up some nice words, pretend very hard that your academic title is an actual thing that gives you some blessed superpowers (as is the way that people often treat it!). I love it. It's truth wrapped in humor to soften the blow to the ones that think themselves overly important. The "shocked ones" are the ones that I try to avoid in life anyway, nice discussions begin at the edge of your comfort zone.

Dude. How can you read "I’ve been a legendary hacker for 98% of my life, but there was a brief period when I did not possess the sum totality of human knowledge." and not realize this is self-parody?
I think even with the edit that this comment is also satire.
The page is extremely obviously full of satire.
Is that you, Mickens?
Having interacted with Mickens quite a bit, he is far from intolerable.
> I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS

This might be my all-time favorite quote that I never to get use in relevant situations because nobody is around who would get the reference. I think of it almost every time I hear the word "tool"

> I think I can count the number of kernel changes I've submitted on one hand

I know I can; I don't even need any of my fingers.

poking directly at hardware registers

Such luxury! I just spent a couple weeks getting FreeBSD booting in the Firecracker VM and most of my debugging was performed by inserting hlt instructions into the FreeBSD kernel and looking at whether virtual CPU halted or hit a triple fault.

I take it the Firecrakcer folks haven't built out support for KVM_GUESTDBG_SINGLESTEP yet :)
> whether virtual CPU halted or hit a triple fault

Ah, I see the gentleman works directly in binary.

> I HAVE NO TOOLS BECAUSE I'VE DESTROYED MY TOOLS WITH MY TOOLS

Only comparable experience I can think of is "breaking" the terminal via bash_profile and then not being able to fix bash_profile via the terminal. Or locking yourself out of a web server via security configuration update.

Raise your hand if you broke the SSHD server config and then restarted it!

That: uh-oh! moment.

Definitely didn't do that last week, no way. Thankfully, the server I didn't push the wrong credentials to before restarting wasn't the new instance of the test server.
I've definitely never done that via Ansible on a bunch of machines at once, nope nope nope.
I’m an enormous idiot who’s done that recently. If not for the fact I had another window open, I don’t know what I would have done.
chroot from an OS whose terminal is not broken and a shell other than bash
zsh was the one I broke, so bash would have been preferable.
This is breathtakingly brilliant, and I am grateful to now have the wisdomful humor of James Mickens in my life. Thanks!
So what you do in that case, add a monochrome adapter to the PC and write to the screen memory. Or send [greatly abbreviated] trace out a serial port.
In at least one instance of breaking my tools with my tools, the machines in question were in a data-center. Ordinarily one might be able to grab debugging output over an NC-SI link, but that only works if the breakage doesn't also unexpectedly crash your NIC firmware...
If you can’t touch it, you might not be able to debug it. Kudos though for testing^H^H^H^H^H^H^Hdebugging in production.
Case in point. Had a client in the late 80’s with an X.25 connection between their office and a Mainframe across town. It was accumulating hundreds of CRC errors per minute. Couldn’t fix remotely so I flew out there. Had a scope hooked up on both sides.

We loaded the connection in varying patterns for about an hour. ZERO CRC errors. Scratching our heads, we decided to punt. Then suddenly the CRC error count started climbing on my scope. I shout into the speakerphone, excitedly, what did you all just do? They’re like “nothing but unplug the scope from the patch panel”. “Please plug it back in”. CRC error count stopped.

Diagnosis: faulty patch panel.

When your job is to build infrastructure, "in a data-center" does not necessarily mean it's a production server :)
Thank you for posting that. Epic read, amazing sense of humor, and I feel like I owe James a beer or three after reading it.