| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by karalala 815 days ago

Already seeing major flaws in the paper.

The benchmarking done in the table 1 is extremely questionable. Their table basically contradicts the results from multiple peer reviewed papers, especially for RNNs which report results much closer to baseline transformers (and conducted much larger experiments btw).

Page 40 they mention that all models are trained with the same lr for comparability.

> Contradicts their own scaling laws table which uses different lr for different models

> And no it is not a fair comparison to use the same lr to test all these different models. Benchmarking results just looks like they are using tuned hyperparameters for their model which happens to not work for other models.

2 comments

bingbingbing777 815 days ago

You should publish a response paper and get them to retract their paper if it has major flaws.

link

karalala 815 days ago

Its xlstm contradicting existing peer reviewed papers lmao. Either xlstm should fix their benchmarks or existing peer reviewed papers should retract.

RWKV-v6 > RWKV-v5 > RWKV-v4, not the other way round obviously. HGRN 8 ppl worse than baseline transformers? NIPS 2023 spotlight paper btw.

link

AIsore 814 days ago

Are you saying this is obvious because people have published the exact same benchmarks which are 100% comparable in journals? If so where are they? I have seen quite a few published benchmarks that could not quite be reproduced, tbh. So, again, what makes this "obvious" to you?

link

logicchains 815 days ago

I thought it was common knowledge that architecture comparisons in papers aren't worth the paper they're printed on; there are so many ways to deliberately or accidentally structure things to favour one architecture over the others. Ultimately the lmsys chatpot arena will be the final judge.

link

karalala 815 days ago

True, but they normally arent this far off. HGRN claims that they outperform transformer for 1B parameter model trained on the pile. HGRN performing 8ppl worse suggests that its useless.

link

AIsore 814 days ago

My experience - many are far off and most of the time published tables of different papers are hard to compare. If you make the assertion here of these results to be flawed, I would like to see more substance (code, reproduction,...). And for balance, for the same reason, hard to verify the accuracy of these results without further insight.

link

logicchains 814 days ago

So many papers play tricks with the learning rate schedule: https://arxiv.org/abs/2307.06440

link

rrr_oh_man 815 days ago

Could you explain for a dum-dum?

link

karalala 815 days ago

Results of xlstm are promising but will need larger scale experiments.

However they completely messed up benchmarking experiments for various RNN models which in their papers claim comparable and even better performance than base transformer.

link

AIsore 815 days ago

These experiments seem pretty large already though, no? How are you so sure they messed up benchmarking? Is the code out already?

link