|
|
|
|
|
by ftxbro
1025 days ago
|
|
> we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points does this mean 'an over-parameterized transformer problem is a convex svm problem'? |
|
But yes, thats how I would read that, and I also see no issue at all with the language in the paper. These terms are used for precision, and have meaning to those in the field. Papers are written for other experts, not laymen.