| Haven't worked on NT kernel, did implement a device driver for it once.
Spent first 4 years of my career doing Linux kernel development. In general, go for it. The level of understanding you obtain by working on the kernel about how things -really- work will make you a better engineer even if it turns out not to be for you. Things I would caution you about based on the downsides of your current job: 1. Kernels make k8s look simple. Not just simple, childs-play. 2. Debugging experience in kernel-land can vary widely depending on what layer of the stack you are working on and that nature of the bugs. Highly concurrent pieces of any kernel are a nightmare to debug because such bugs are normally race conditions and timings are incredibly sensitive when you get this close to the metal. 3. Whilst you won't have to worry about this initially as you will probably need a few years experience at this level before you design new kernel subsystems I definitely wouldn't consider architecture at this level less complicated than distributed systems. This is because computers these days -are- distributed system. NUMA essentially means you have all the same problems. You do have much more convenient tools for solving them though (at the cost of performance) like HW coherency, etc. 4. Ok yeah, you shouldn't need to worry about this one. You won't be standing up new build systems or anything. 5. Well.. this is the rough part of kernel bugs. When you fk something up you potentially fk over everyone, usually in a very subtle, hard to diagnose and even harder to workaround way. So yeah, go do it but don't do it because you think you will be getting away from those things because you aren't really.
Do it because you think it will be enriching/fun/whatever you want more of in your life. |
1. Yeah, I have no doubt about that. My peeve with k8s is not its inherent complexity, but just how much worse it made our day-to-day work. When we ran things on managed services, everything was a lot more understandable and straightforward. k8s is piles and piles of hard-to-discover yamls and I (and other folks in the team) feel that it made our infra a lot harder to understand and change.
2. This was my most unclear point. What turns me off about my current experience debugging distributed systems is having to trudge through millions of log messages coming in constantly. I just can't say I like the ELK stack, despite its popularity.
3. Fair point, though I'm aware of the distributed nature of things even within a single host. I do expect it to be different though. Will I still have to account for "the response was lost, you don't know what happened" situations? My understanding is that being close to the metal I can at least trust that the wires are all still there and working, and if they aren't, well, I can say the machine (or something within it) is broken and needs to be replaced. Do you have to do things like sagas or deal with eventual consistency at the kernel level? Hardware guarantees seem a lot stronger than what you get in networked systems.
4. Yay!
5. The difference here is that even a distributed system that's built correctly will still page someone one day. There'll be more traffic than it was built for, or some network dependency will be down (like, not the things you accounted for, but something fundamental like a DNS server or whatever), etc. My expectation (and past experience) is that in code that's intended to run on a single host, I'll either get it right and it'll just work as intended, or there'll be bugs that I have to fix, and once I do things will just work. What I find extremely frustrating about services is that it seems that no matter how much effort you put into quality, there will still be times when it just goes down and people have to put out a fire.
Definitely going for it after all the responses here!