| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ACCount37 117 days ago

Better-than-random initialization is underexplored, but there are some works in that direction.

One of the main issues is: we don't know how to generate useful computational structure for LLMs - or how to transfer existing structure neatly across architectural variations.

What you describe sounds more like a "progressive growing" approach, which isn't the same, but draws from some similar ideas.

1 comments

rao-v 110 days ago

Agree re: progressive growing

In terms of sub structure - in the old days of Core Wars randomly scattering bits of code that did things could pay off. I’m imagining similar things for LLMs - just set 10% of weights as specific known structures and watch to see which are retained / utilized by models and which get treated like random init

link