| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dartos 475 days ago

> because code and text share domain (text based) – large, generic models will always out-compete smaller, specialized ones – that's the lesson

All digital data is just 1s and 0s.

Do you think a model trained on raw bytes would perform coding tasks better than a model trained on code?

I have a strong hunch that there’s some Goldilocks zone of specificity for statistical model performance and I don’t think “all text” is in that zone.

Here is the article for “the bitter lesson.” [0]

It talks about general machine learning strategies which use more compute are better at learning a given data set then a strategy tailor made for that set.

This does not imply that training on a more general dataset will yield more performance than using a more specific dataset.

The lesson is about machine learning methods, not about end-model performance at a specific task.

Imagine a logistic regression model vs an expert system for determining real estate prices.

The lesson tells us that, given the more and more compute, the logistic regression model will perform better than the expert system.

The lesson does not imply that when given 2 logistic regression models, one trained on global real estate data and one trained on local, that the former would outperform.

I realize this is a fine distinction and that I may not be explaining it as well as I could if I were speaking, but it’s an important distinction nonetheless.

[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html