Hacker News new | ask | show | jobs
Ask HN: How are distributed systems as a field of study compared to ML/AI?
9 points by distsysdude 2167 days ago
I currently work on Database Development at Oracle. I would like to acquire knowledge on something other than the DB internals, and I've been recently fascinated with Distributed Databases like CockroachDB, AWS Aurora, Azure Cosmos, TiDB, etc.

So I was thinking I'd dive deep and learn more about distributed systems and possibly try to switch to any of the above mentioned companies.

I would like HN's opinion about the scope for Distributed Systems(& Distributed Databases) as a career choice, compared to the current hot areas like Machine Learning/AI.

A bit of background about me : I did my undergrad in a Non-CS field, but I've had an interest for Computer Science since High School, so the switch to a full fledged CS job was not that difficult. I'm self motivated and willing to spend copious amount of time to learn something that would help with my career. But I'm confused as to what to study.

Should I stick with my current interests (or) follow the industry trend, and try to learn an ML/AI ?

Thanks!

3 comments

I’m biased as my work is primarily focused on large-scale distributed physics simulations, and incorporating machine learning into these. As a result, I treat ML very much as a means to an end.

Of course with the caveat that your situation is unique to you so I can’t give any definitive answers, I would think long and hard before jumping on the ML hype train. In my experience, it doesn’t pay to follow the trend; you’ve either gotta be first or you gotta be unique. Now that’s not to say that doing ML work work will only be restricted to a select few which you aren’t a part of, but myself and a few others are wary that the ML hype train (at least as far as deep learning is concerned) might be passing. The days of the AI labs paying million-dollar bonuses are nearly gone, unless (and someone can correct me if I’m wrong) you’ve got an alternate skill set they’re looking for. Of course, that doesn’t mean there aren’t plenty of people and businesses who would need CRUD-type ML setups; with your experience in databases I imagine that could be a unique angle to attack it from. Whether it’s a good idea to try and pivot into a career using ML really depends on your specific situation and the opportunities therein; to get more solid advice I would ask a trusted colleague or mentor, and would not consult people online, even if they are from HN.

For my PERSONAL opinion: I can’t speak to what is normally done in other parts of distributed systems, since scientific tools are usually bespoke and don’t use the same set of approaches as commercial products. However, just thinking about it from an outsiders POV, it seems to me like focusing more on distributes systems would be a winning combination. I don’t think computers will advance enough in the next 30 years that the need for distributed data and compute management skills will go away; hell with IoT you might be looking at a boom down that career path. From my perspective it’s only upside if you focus on expanding your skilllset in these areas; if ML continues to thrive there’ll most definitely be a need for distributed systems to run these models on. And if an AI winter hits, you’ll have a solid set of core skills to fall back on which I don’t imagine will go out if favor anytime soon. Those are just my two cents though, of course YMMV.

Thank you for taking time to answer this!

I never knew that distributed physics simulations could be a career field. I always thought that such problems would be handled by scaling up vertically or just throwing a super computer at it.

If you don't mind, can you please elaborate a bit about the type of work that you do and scope of problems that you solve every day?

Thanks again!.

I can elaborate a bit. Most of the large-scale problems are actually as straightforward as “just throw a supercomputer at it”. However, just like when mathematicians say “that’s an implementation detail”, it turns out that actually throwing a supercomputer at the problem is much more difficult to do in practice than merely setting up some shell scripts to run, especially where scientific computing is concerned. For one, there’s usually no concept of “micro services” or “containerized” applications, at least not in my experience. Most of the modern distributed computing practices are actually thrown right out the window when it comes to scientific computing, since the scientists are going to be directly programming distribution schemes via MPI and stuff. The reason is because academic projects don’t have lots of money and need to efficiently use every dollar, and because most of the time distribution schemes really aren’t suitable for the science. You might have one layer where node interactions occur according to some mathematical and physical criteria instead of “load” or some other abstract flag, for example; that’s a bit harder to code for, and it’s much better to have a domain scientist who knows the physics deciding how to decompose the problem, instead of a computer scientist who has no idea of the physics adopting a scheme which ignores the problem entirely. Hence why I said most scientific tools are “bespoke”.

The result is that most of the distributed systems people are moved to a supporting role, where their job is to develop tooling and libraries to allow for better communication between nodes, for example. I’ve also heard of some compsci people being directly integrated into these scientific teams to develop specialized APIs and such in-house, but that’s a bit more rare imo. These are just some examples of how science and compsci intersect; for example here’s one group I know of: https://www.ornl.gov/group/dcs

They're basically everywhere, there's a lot to learn, it's a valuable skill for a potential employer and you're interested. Seems pretty obvious that you should go deeper with distributed systems. Also, your lack of formal education won't hold you back as much as it would with ML. The idea that you should go do math for the next couple of years just because the field seems hot, doesn't seem well thought out.
Thank you for taking time to answer this!

I honestly thought that my lack of formal education would be hurdle when learning/working on Distributed Systems. Don't you agree?

Also, I've always wondered how ML models are deployed in production at scale. Building a model using libraries in Python seems fine, but how do they distribute and deploy it in Production?

Huh? Seems to me that you should be interested in and learning about all manner of computer related things including Distributed Systems, Machine Learning, and Artificial Intelligence. Knowledge and techniques useful in one context are generally useful in other contexts. The ability to see structure and similarities across systems and applications is invaluable.
I'm not quite sure about that. For example, I dont think any major Database vendor has considered using ML algorithms inside a DB. Google did try replacing B-Tree indexes with Neural networks, but that was only a research project and it wouldn't have been possible to scale it up to meet production demands.

But I think I understand where you're coming from and I should definitely try to widen my knowledge base.