Hacker News new | ask | show | jobs
by crossroadsguy 5 days ago
What hopes/paths does a mere CS bachelor (not deep into stats/maths), and mid level dev (native mobile only; 10-15 years exp.), have about not only understanding it (maybe not fully) but getting possibly into this as a career? Not expecting churning out models and AI systems from the first weeks/months but entry/employment into this field?

(If I can be honest, and I am not being disparaging about anything lest it might seem so, I am looking at it from a career breakthrough/move perspective rather than an intellectual pursuit.)

3 comments

I think you need to ask what you actually want to do with the AI.

If you want to be a researcher and come out with the next breakthrough, get ready to go back to school and learn some math.

If you just need to learn how to use it well and build things with it, then you probably just need to have a high level understanding.

Same as programming. I’d bet most programmers have no idea about the physics that makes computers work.

> I think you need to ask what you actually want to do with the AI.

What about improving the efficiency of token consumption, etc., basically opportunities for improving cost/performance?

I keep thinking there has to be a better way to share context with models than dumping entire gigantic skill files of raw text or otherwise into them - I'm betting there's a bunch of low-hanging fruit there.

There may be some low hanging fruit, but they're not available to people without deep understanding of how the math works. Well paid people already spend a lot of time thinking about this.
i am not sure acctually of the math is acctually that complicated/important. the math around neural networks is calculus/chain rule etc and for model comparison/validation one needs statistics. the required math for e.g. understand transformers is quite accessible.
You missed the third and most important reason to learn: fun.

Which sums up HN these days.

Im also a mere mortal, and after putting a few years into it IMO I’d say people make it much more complicated than it actually is. I failed most of my math courses for lack of interest, but found passion later with the aforementioned SLAM stuff. I have no doubt you or any other programmer could learn this stuff, especially since you can ask ChatGPT clarifying questions.

I have no idea about careers at this point, I’m still doing fancy IT work as my day job I and look away from the future with dread. I also haven’t been looking for new roles on the open job market, so who knows maybe there’s multimillion pay packages for anyone who can articulate how attention works in an interview.

I have a BS in CS (and have been in the field for 25 years). I couldn't understand the transformer architecture until I built a few myself. Here are the books I worked through. I now feel I have a very good understanding of modern LLMs.

https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...

Has it given you enough of an understanding that you can pick up and follow research papers or did you have to do more to achieve that?
I went this route because I had difficulty visualizing the content of the Attention Is All You Need paper. After going through both books, I can now understand every part of that paper.

I'm currently working on a robotics project that uses Nvidia's GR00T N1 model, and I was able to understand the research paper. [0]

[0]: https://arxiv.org/abs/2503.14734

Thank you for the information.