| Agreed. It's sad how many SRE candidates are positioning themselves as senior and end up being someone with basically nothing more than surface level knowledge of AWS manages services. I have a relatively similar approach as the article: 1. I have them walk me through a production dockerfile and explain what is going on (it is not a very complex dockerfile). 2. I have them troubleshoot a broken web server with the most basic scenario (public web server that we run, ec2 with apache2, public ip, no load balancers, no cdn, just a VM serving a static html page on xyz.actualdomain.com). Things that are broken: 1. NXDOMAIN (I make sure to unset the DNS before the interview) 2. Security group isn't allowing 80/443 3. NACL isn't allowing egress 4. iptables isn't allowing 80/443 5. Apache2 is stopped 6. Apache2 is bound to 127.0.0.1:80/443 7. Self signed TLS I don't require specific incantations, I know I can't write the iptables insert without looking up an example. But I do expect candidates to know where and what to look for to troubleshoot the entire chain. They are allowed to run any command they want (I'm running it on a screen share). If they get stuck on a portion, they get some hints, then finally they get given the answer to get past it. My goal is not a gotcha, my goal is to see how they attack the problem and if they are at least familiar with how things can break and have guesses at fixes. Fully a third of interviewees get stuck on NXDOMAIN, which is just shocking to me interviewing people that, on paper, have over a decade of deep Linux and cloud experience. To me, the scenario I present is basic troubleshooting and something that should be a breeze for most candidates. |
Maybe it's different for SREs, but as a fullstack web developer, unless you've got a greenfield project, usually someone else sorted out DNS a long time ago. And unless you're working on something that changes DNS a ton, nobody has touched DNS in a long time.
Additionally, I've seen NXDOMAIN way more often as a local machine configuration problem, rather than as a production environment DNS problem.
So if I'm going in to debug a server, but then I see an NXDOMAIN, I could see myself getting stuck wondering just what else in the world is broken. If I was doing the test on my own hardware, I might panic that my machine is in a bad state. If I was doing the test on hardware the interviewer provided, I might start wondering if this is some kind of trick, and I have to debug a broken client and a broken server.
Then again, maybe if I went back to those people who couldn't add two numbers in JS, they'd have a great explanation too :)