|
|
|
|
|
by nothrowaways
191 days ago
|
|
> Principal component analysis of 200 GPT2, 500 Vision Transformers, 50 LLaMA-
8B, and 8 Flan-T5 models reveals consistent sharp spectral decay - strong evidence that a small number of weight
directions capture dominant variance despite vast differences in training data, objectives, and initialization. Isn't it obvious? |
|
It isn’t obvious that these parameters are universal across all models.