Hacker News new | ask | show | jobs
by hyperrail 1441 days ago
Here's my thoughts based on working on Windows in-box code (user mode only, though) across 2 different Windows component teams, totaling about 5 years with a break in the middle. (Split into multiple comments for readability. Some parts removed because other people said it better.)

Most importantly, MAKE SURE YOU KNOW WHAT THE JOB IS! Microsoft people don't try to be dishonest, but there can be misunderstandings between you and your future coworkers about your role, and if you take a job that turns out to be different from what you expected, you will be unhappy.

If you haven't already, you should ask the hiring manager more about what the team does. Try to get enough specifics that you might not know everything the manager refers to, but can easily Google what you don't know: "For example, in Windows 10 version 2004, we shipped the API and implementation for the Windows hypervisor feature that lets third-party VM host software like VirtualBox force their VM guests' virtualized RAM to be paged into the host machine's physical RAM all the time." (Not an actual feature, at least as far as I know.) At this level of detail, you'll be able to judge whether the work is really what you think it is.

Talk to your other interviewers to learn more about the work and the team, if they gave you their contact info or otherwise seemed inclined to hear from you. 3 out of 4 of them are likely going to be your peers, and the 4th is either the hiring manager, another mid-level to senior leader, or a team architect - all will be at least close enough to your team that they won't give you vague generalities.

3 comments

An advantage of working at Microsoft that only other huge tech companies can match is that you'll get the chance to interact with many different people, some of whom will inspire outright hero worship among you and your direct coworkers. Those interactions could be in email discussions (having to send endless emails to random people or unarchived mailing lists to get things done or find things out is the curse of Microsoft life), or in API review meetings, or just water cooler talk.

Getting the chance to work with people like that was one of the highlights of my Microsoft career. Some of them are famous or semi-famous outside Microsoft, like David Cutler (mentioned repeatedly in this comment thread), while others are not known outside MSFT at all but arguably should be, while others are respected among a small geeky community (I'm thinking here of 2 Linux kernel subsystem maintainers who joined MSFT after making their names in Linux, and continue Linux work today). If making those connections is something you'd want to do as well, I'd definitely see that as a big plus of an MSFT job.

Even if you do work on Windows in-box code only, it might not be so easy. Here's another issue you raised with your old job:

> 2. Debugging exclusively via metrics and logs, since I can't just attach a debugger to a running server.

I find debugging Windows issues fun, but it might not be for you. On rare occasions, you might be lucky even to have telemetry and logs, for customer issues that can't be reliably reproduced. "We noticed that 30% of our module's crashes in the last Windows Insider Preview Dev build had this new stack, so we used Watson Portal to request more crash dumps from devices that ran into that particular crash, and a week later, we've accumulated this set of dump cabs...."

On the other hand, if you get an automated email saying "test case XYZ broke due to a crash in your module", you'll probably get a live kernel debugger remote - an email link or copy-paste command to open a kernel debugger on the dead VM, preserved for your debugging. But of course, bugs aren't necessarily caught that early, and even if they are, finding things via kernel debugging is a needle-in-a-haystack problem because you're debugging the entire computer, not just one process.

To be fair, Watson post-mortem debugging is something that pretty much every team that ships a product running on users' hardware has to deal with.
I think what I meant to say is that sometimes even telemetry and logging is unavailable to you. In my example, you might ask for more dumps for your Windows component or Microsoft first-party app on Watson Portal / Get More Data, but even after a few days of waiting you don't get any more, or only one more, because the issue occurs in the wild too rarely to show up in telemetry, or repros often but doesn't reliably produce a useful dump.

As an aside, if you are a third-party Windows application developer, I highly recommend you register your apps with the Windows Desktop Application Program: https://docs.microsoft.com/windows/win32/appxpkg/windows-des... - this will give you access to the same app reliability telemetry, including user-mode crash dumps (.DMP / .CAB files), that Microsoft has. This is only the latest of several iterations of this data access over the years, but it and its predecessors are still badly underused in my non-Microsoft experience.

Yup, this all sounds very familiar to me - and I never worked on anything Windows.

The "repros often but doesn't reliably produce a useful dump" is particularly frustrating. Like, you're seeing all those crashes, and every one of them is likely to be some poor user who is at best annoyed, and at worst just lost some data. And you have no clue as to what the bug is or how to fix it to help them.

One reason I specifically suggest you confirm what your job is is that the teams that work on Windows in-box kernel-mode components aren't just Windows-specific teams anymore. They are part of the Microsoft Azure Edge + Platform division.

That name is misleading, but only partly so: shipping the Windows desktop and Windows Server products are a major part of that division's mission, but so is building Microsoft's internal-use Linux distribution CBL-Mariner, all of Microsoft's embedded and Internet of Things software products (Azure ThreadX RTOS, Windows IoT, etc.), and various Microsoft-internal software and hardware products.

It's very possible that the team you'd be joining would be working on an Azure product or Windows kernel-mode code for an Azure product, which means all your 5 issues could be a concern, especially:

> 4. The insane amount of work required to stand up even the smallest microservice: infrastructure provisioning, certificates, security reviews, GDPR compliance, etc.

> 5. Anything I build will end up paging some poor soul at 3am some day when something is down or under heavy traffic.

(Point 5 is even worse in Azure because you will get paged yourself if you're on-call. You can't just assume that the operations or site reliability engineering teams will take care of problems without pulling in the original engineers, especially when the product is new and buggy :)

It's also possible that you could be working in a department that focuses on writing bug fixes for released versions of Windows, instead of writing features and fixes for the next version. The bug-fix department will often be called something like "servicing" or "sustained engineering" (I believe it's currently called Windows Servicing and Delivery, or WSD).