Hacker News new | ask | show | jobs
by shahinrostami 1804 days ago
We highlighted something similar in the multi-objective optimisation literature [1]. Unfortunately, it looks like comparing benchmark scores between papers can be unreliable.

- _Algorithm A_ implemented by _Researcher A_ performs different to _Algorithm A_ implemented by _Researcher B_.

- _Algorithm A_ outperforms _Algorithm B_ in _Researcher A's_ study.

- _Algorithm B_ outperforms _Algorithm A_ in _Researcher B's_ study.

That's a simple case... and it can come down to many different factors which are often omitted in the publication. It can drive PhD students mad as they try to reproduce results and understand why theirs don't match!

[1] https://link.springer.com/article/10.1007/s42979-020-00265-1

1 comments

This really sucks when some papers don't come with code that can reproduce the benchmark results. I wish there was a filter for "reproducable" in search results