| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by geremiiah 265 days ago
	Their representation is simpler, just a transformer. That means you can just plug in all the theory and tools that have been developed specifically for transformers, most importantly you can scale the model easier. But more than that, I think, it shows that there was no magic to AlphaFold. The details of the architecture and training method didn't matter much. All that was needed was training a big enough model on a large enough dataset. Indeed lots of people who have experimented with AlphaFold have found it to behave similiar to LLMs, i.e. it performs well on inputs close to the training dataset and but it doesn't generalize well at all.

4 comments

johncolanduoni 265 days ago

Except their dataset is mostly the output of AlphaFold, which had to use the much smaller dataset of proteins analyzed by crystallography as input. This is really an exercise in model distillation - a worthy endeavor but it's not like they could have just taken their architecture and the dataset AlphaFold had and expect to get the same results. If that was the case, that's what they would have done because it would've been much more impressive.

link

visarga 265 days ago

> But more than that, I think, it shows that there was no magic to AlphaFold. The details of the architecture and training method didn't matter much. All that was needed was training a big enough model on a large enough dataset.

People often like to say that we just need one more algorithmic breakthrough or two for AGI. But in reality it's the dataset and the environment based learning. Almost any model would do if you collected the data. It's not in the model, it's outside where we need to work on.

link

aDyslecticCrow 265 days ago

I think the sentiment that simplicity is good, is a false conclusion. Simplicity is simply good scientific methodology.

Doing too many things at once makes methods hard to adopt and makes conclusions harder to draw. So we try to find simple methods that show measurable gain, so we can adapt it to future approaches.

Its a cycle between complexity and simplicity. When a new simple and scalable approach beats the previous state of art, that just means we discovered a new local maxima hill to climp up.

link

cma 264 days ago

They had to largely use alpha fold for the data part of the transformer scaling laws so not quite a bitter lesson, but still interesting.

link