|
|
|
|
|
by westoncb
185 days ago
|
|
> So, they found an underlying commonality among the post-training structures in 50 LLaMA3-8B models, 177 GPT-2 models, and 8 Flan-T5 models; and, they demonstrated that the commonality could in every case be substituted for those in the original models with no loss of function; and noted that they seem to be the first to discover this. Could someone clarify what this means in practice? If there is a 'commonality' why would substituting it do anything? Like if there's some subset of weights X found in all these models, how would substituting X with X be useful? I see how this could be useful in principle (and obviously it's very interesting), but not clear on how it works in practice. Could you e.g. train new models with that weight subset initialized to this universal set? And how 'universal' is it? Just for like like models of certain sizes and architectures, or in some way more durable than that? |
|