Hacker News new | ask | show | jobs
by disgruntledphd2 783 days ago
This makes me sad, not because I disagree with it, but because it's basically common wisdom in the statistical and ML communities (of practitioners). In my experience, the only people who think architecture/model choice makes a huge difference are n00bs and academics.

That being said, definitely if you use a linear model (like lasso) vs a tree based model (like XGBoost), you'll see differences, but once you have a flexible enough model and a lot of data, training time and inference complexity tend to become better ways to make a model choice.

5 comments

>In my experience, the only people who think architecture/model choice makes a huge difference are n00bs and academics.

There are countless competitions, etc. on Kaggle, AICrowd, or other platforms with an enforced standardized data set. Every entrant uses the same data set and there's a huge difference between the best and worst submissions.

> ...on Kagi,....

Did you mean https://www.kaggle.com/?

Yes, thanks.
Agreed but if you look at winning submissions which i did stop doing, a lot of them do very good feature engineering which is not a model related thing.
> the only people who think architecture/model choice makes a huge difference are n00bs and academics.

Are you referring to the current state of our best existing models or the potential future of ML? I find it incredibly hard to see how an LLM could implement the best “physically allowable” approximation to Solomonoff induction.

Then again, I thought it was extremely unlikely neural networks would have the abilities they currently exhibit, so who knows.

We manage to train neural nets to approximate complicated data sets via rather simple process: back propagation.

It is indeed a marvel that it works nearly as well as it does.

But then again, evolution is even dumber (in the sense that it only makes random choices that thrive or perish, and can't even take gradients into account), but evolution has still managed to produce intelligent critters.

I guess when you have enough dimensions greedy approaches to optimisation / hill climbing can work well enough, even when you have challenging problems?

Especially if you are allowed to move to some meta levels. Eg evolution doesn't build planes, it built brains that can figure out how to build planes. Similarly with back propagation perhaps.

> In my experience, the only people who think architecture/model choice makes a huge difference are n00bs and academics.

The most notable voice refuting this opinion on Twitter was Yi Tay (founder of Reka.ai), who definitely does not belong to either of those categories!

Tay (ex. Google Brain) founded Reka.ai two years ago, and their latest multimodal language model is close to SOTA in performance.

https://x.com/YiTayML/status/1779895037335343521

This "n00b" seems to disagree with your sentiment on the importance on architecture: https://news.ycombinator.com/item?id=40155667
Unfortunately Google brain researchers have not yet discovered my brilliance, but if you read my argument it's about the data being much more important than the model. Granted transformers are a great model, but that doesn't refute my point.

Also arguments from authority are boring.

Why does it make you sad? It seems intuitiv and simple. And in reality of course the optimisation part is not trivial. What would we better if the "it" was more complicated?
It used to be that people would get into these fields thinking ML would need specifically human insights, deep thinking, and philosophical insights about the nature of consciousness.

You would get into natural language modelling because you had a deep love of language. Because you think you're close to figuring language out in a systematic way, with just a few years more study.

There's a certain sadness, I think, in the revelation that the robots don't need the expertise of humanity's greatest experts and masters, they just need us to click all the squares that contain a motorcycle.

This is 100% not why I am sad, see my other reply for information.

As an aside, it's wild how people put their own spin onto what I said.

Obviously I should have been clearer :shrug:.

Well, you have to forgive some for making assumptions based on your choice of username…
Fair, I'm just generally disgruntled to be fair, the PhD was just the name of my soon abandoned blog.
> It used to be that people would get into these fields thinking ML would need specifically human insights, deep thinking, and philosophical insights about the nature of consciousness.

What's sadder is coming into a field pre-deciding that the way you approach it "is the right way" and can't tolerate that different mindsets can also get results.

How do you know? We’re not there yet.
Because of the way it's presented, as if it's some vast new discovery that OpenAI have made, rather than common wisdom.

It makes me sad when people rediscover things (with massive compute in this case), that were already known.

It's very much spend a year in the lab to save an hour in the library.

Possibly because being in the business of trying to turn iq edge into money, not data edge into money.