Hacker News new | ask | show | jobs
by TaupeRanger 1261 days ago
There are none anymore. We now know that throwing a bunch of bits into the linear algebra meat grinder gets you endless high quality art and decent linguistic functionality. The architecture of these systems takes maybe a week to deeply understand, or maybe a month for a beginner. That's really it. Everything else is obsolete or no longer applicable unless you're interested in theoretical research on alternatives to the current paradigm.
5 comments

You are plain exaggerating. You can't do all of them in a few weeks. Algorithms: Lin Reg -> Log Reg -> NN -> CNN + RNN -> GANs + Transformers -> ViT -> Multimodal AI + LLMs + Diffusion + Auto Encoders

    SVM, PCA, kNN, k-means clustering, etc.

    LightGBM, XGboost, Catboost, etc.

    Optimization and optimizers.

    Application-wise:
    Classification, Semantic Segmentation, Pose Estimation, Text Generation, Summarization, NER, Image Generation, Captioning, Sequence Generation (like music/speech), text to speech, speech to text, recommender systems, sentiment amalysis, tabular data, etc.

    Frameworks:
    pandas, sklearn, PyTorch, Jax -> training  inference, data loading

    Platforms:
    AWS + GCP + Azure
    And a lot of GPU shenanigans + framework/platform specific quirks
All these will take you ~2 years or 1.5 years at least,

given that:

- you already know Python/any programming language properly

- you already know college level math (many people say you don't need it, but haven't met a single soul in ML research/modelling without college level math)

- you know Stats 101 matching a good uni curriculum and ability to learn beyond

- you know git, docker, cli, etc.

Every influencer and their mother promising to teach you Data Science in 30 days are plain lying.

Edit: I see that I left out Deep RL. Let's keep it that way for now.

Edit2: Added tree based methods. These are very important. XGBoost outperforms NNs every time on tabular data. I also once used an RF head appended to a DNN, for final prediction. Added optimizers.

> SVM, PCA, kNN, k-means clustering

Are these still relevant in the age of Deep Neural Networks?

Yes, there are all kinds of tasks where the appropriate solution is to use a DNN for much of the learning (either directly learning the correlations or as transfer learning from some large-data self-supervised task) and then, once you have the results of that DNN inference, work with these methods - apply PCA for interpreting the resulting vector, or to separate out specific dimensions to expose them for adjustment in some generative task; or perhaps the best way for the final decision is a kNN on top of the DNN output, etc.
It's not in your list but decision trees still outperform DNN on many tabular problems and can be trained faster.
Also boosting.

But yes these algs are the basis of a lot of more modern algorithms.

A deep NN won't do unsupervised clustering for ex, and NNs perform more poorly than simpler models on small datasets

Yes.

Different problems require different solutions.

Sometimes, an NN would be overkill.

And stakeholders in many situations would like insights why the prediction is what it is. NNs are miles behind LogReg in terms of interpretablity.

PCA is a foundational dimension reduction technique, and kNN can be used in conjunction with embeddings.

k-means is still great when you have prior/domain knowledge about the number of groups.

K-means is pretty poor when the clusters are not linearly separated, but it is the basis of a lot of more modern clustering techniques (kernel K-means if you have prior knoweledge, spectral clustering...)
A month to deeply understand?

I've been doing it since early 2019 and there are still subtleties that catch me off guard. Get back to me when you're not surprised that you can get rid of biases from many layers without harming training.

I broadly agree with you, but the timeline was just a little too aggressive. By about 10x. :)

This is separate from understanding how a language model or transformer works. You could read the major papers behind those ideas and read every line of code involved several times over in a month. I'd recommend it, if you're super curious.

You can figure out the bias thing after about a month (or so) of hands on practice. Do one Kaggle seriously and it'll become pretty clear, pretty quickly.

> I've been doing it since early 2019 and there are still subtleties that catch me off guard.

That's true of every non-trivial discipline. I often learn subtleties about programming languages and hobbies I've been dealing with for decades.

This is definitely a take that ignores the massive amount of utility for ML that exists outside of generative images and NLP on the one hand and on the other vastly misrepresents the time it takes to understand a model, assuming one does not already have a background in CS, linear algebra and in particular matrix calculus, probability, stats, etc...
You still need to understand some basic theory/math about probabilistic inference (along with some knowledge of linear algebra), or else you’ll get a bit overwhelmed by some of the equations and not understand what the papers are talking about. PRML by Bishop is probably more than enough to start reading ML papers comfortably though. (This would probably be too easy for a competent math major, but not all of us are trained that way from the beginning…)
I'm not sure why you're getting downvoted. I find it hard to believe that someone without a decently strong math background could make sense of a modern paper on deep learning. I have a math minor from a good school and had to brush up on some topics before papers started making sense to me.
What resources are there to understand in a month?