| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bingbingbing777 815 days ago
	You should publish a response paper and get them to retract their paper if it has major flaws.

1 comments

karalala 815 days ago

Its xlstm contradicting existing peer reviewed papers lmao. Either xlstm should fix their benchmarks or existing peer reviewed papers should retract.

RWKV-v6 > RWKV-v5 > RWKV-v4, not the other way round obviously. HGRN 8 ppl worse than baseline transformers? NIPS 2023 spotlight paper btw.

link

AIsore 814 days ago

Are you saying this is obvious because people have published the exact same benchmarks which are 100% comparable in journals? If so where are they? I have seen quite a few published benchmarks that could not quite be reproduced, tbh. So, again, what makes this "obvious" to you?

link

logicchains 815 days ago

I thought it was common knowledge that architecture comparisons in papers aren't worth the paper they're printed on; there are so many ways to deliberately or accidentally structure things to favour one architecture over the others. Ultimately the lmsys chatpot arena will be the final judge.

link

karalala 815 days ago

True, but they normally arent this far off. HGRN claims that they outperform transformer for 1B parameter model trained on the pile. HGRN performing 8ppl worse suggests that its useless.

link

AIsore 814 days ago

My experience - many are far off and most of the time published tables of different papers are hard to compare. If you make the assertion here of these results to be flawed, I would like to see more substance (code, reproduction,...). And for balance, for the same reason, hard to verify the accuracy of these results without further insight.

link

logicchains 814 days ago

So many papers play tricks with the learning rate schedule: https://arxiv.org/abs/2307.06440

link