Hacker News new | ask | show | jobs
by whimsicalism 1059 days ago
> 1) there are a lot of papers claiming to be the successor to the Transformer, and not all of them are cited; e.g., the MetaFormer is missing https://arxiv.org/abs/2111.11418. Another candidate that wasn't compares against (or at least argued why it wouldn't make sense to compare against) are the Hopfield Networks https://arxiv.org/abs/2008.02217.

Neither of those papers are NLP applicable? And I think it's perfectly fair to focus on the alternatives (ie. like H3 and RWKV) that have been able to scale up to LLM levels and perplexity, which neither of the alternatives you mention have. Should they just cite every 'is All You Need' paper?

1 comments

Not, but having a small section in the paper would be reasonable, that illustrates why the most pertinent models that might be relevant at first sight (like the ones I cited) are actually not applicable.

The onus is on the authors to place their research in context and provide compelling arguments - not on the reader to guess why their model was compared against model A, but not model B.

What do I mean by "pertinent"? Of course it is not necessary to cite every "All You Need" paper.

But:

(A) I'd argue it would be necessary to cite those "All You Need" papers that have either gathered a fair amount of citations or media attention (which is the case for both of the papers I linked to), or are meaningful descendants (in the "has been cited by" tree) of thise papers. As I said, this is not really my field - but I would say there is some change that among the hundreds of papers that have cited the papers I linked, some have been scaled up to LLM levels and use basically the same MetaFormer/Hopfield architecture.

(B) If the above isn't the case and none of those models have been scaled to LLM levels - that's fine too. But then please tell the readers that you did due diligence and found that there actually is this gap in the literature (of course, feel free to close it yourself and then be the first one to train one of those models to such scales; that's the reward for doing a solid literature review - and who knows, maybe you stumble upon an even better model that will get you many citations).

(C) If you cannot perform a comprehensive literature search, but the models you compare against cover 90% of the models that out there (in production or research), and you can back that claim up - then, of course, you're safe too and I'd be very happy to be able concongratulate you that you really dis manage to achieve a breakthrough.

(D) Even if none of these things applt and it just costs too much in terms of computational power to train many other potentially competing models, or is too cumbersome to carry out a a comprehensive literature reading- that's also fine. You can then simply constrain your paper more and consider a more precisely defines slice of models, for which you then can actually do a through literature review and comparison. Then you'd also need to adapt your title though, so that it reflects the more precise scope. And please don't take this negatively, as a reader I'd much rather have a model that is proven with the highest scientific standard to be state-of-the-art on a more narrow scope, rather than a more broad claim, with only a moderate amount of evidence backing it up.

So, PLEASE, don't leave us guessing!

If you have a new candidate model that you claim is the "successor" -a strong word- but compare it to just 6 other models, and importantly, you don't let the reader know which of the options (A) to (C) apply, then you have to go with (D).

Machine learning is already too full with papers whose titles are overly broad. Somehow, other scientific disciplines have a much more sober title formulations, yet ML insists on colorful titles that usually are not particularly informative (and yes, "XYZ is All You Need" I consider to be an example of such a bad title).

Critiques of omission are the easiest ones to levy and are informationally asymmetric in that the person leveling them will typically have more information on the topic being omitted than the person being critiqued. Hopfield networks & Metaformers are largely irrelevant in the domain of potential transformer replacements for LLM. Unless they have been shown to be relevant in any way, I don't see why the paper need cite them (and thus build some recursive case that future papers must cite these papers).

I'll just say I'm glad that I'm not submitting papers to be reviewed any more :)

> Unless they have been shown to be relevant in any way, I don't see why the paper need cite them.

Fair. Your argument then falls precisely into category (C) of the four mutually exclusive options I outlined above.

But you'd then need to argue why the 6 models you compared against is the comprehensive model sample to test against, that contains -- and not just some arbitrary set of recent models that happen to be dominated by the newly proposed model. (And maybe that is indeed the case; then it should be easy enough to update the arxiv draft by incorporating a section where you argue along those lines.)