|
|
|
|
|
by karalala
769 days ago
|
|
Already seeing major flaws in the paper. The benchmarking done in the table 1 is extremely questionable. Their table basically contradicts the results from multiple peer reviewed papers, especially for RNNs which report results much closer to baseline transformers (and conducted much larger experiments btw). Page 40 they mention that all models are trained with the same lr for comparability. > Contradicts their own scaling laws table which uses different lr for different models > And no it is not a fair comparison to use the same lr to test all these different models. Benchmarking results just looks like they are using tuned hyperparameters for their model which happens to not work for other models. |
|