Hacker News new | ask | show | jobs
by cauch 528 days ago
Rolling eyes.

> Especially if the author shares neither the data or the code

What are you talking about. In this example, why do you invent they are not sharing the data? That's the whole point.

> A study that is replicated using the same data is "avoiding the replication crisis"

BULLSHIT. You can build confidence by redoing the experience with the same data, but it is just ONE PART and it is NOT ENOUGH. If there is a statistical fluctuation in the data, both studies will conclude something false.

I have of course reproduced a lot of algorithm myself, without having the code. It's not complicated, the paper explains what you need to do (and please, if your problem is that the paper does not explain, then the problem is not about sharing the code, it's about paper badly explaining).

And again, my argument is "nobody share data" (did you know that some study also shares code? Did you know that I have occasionally shared code? Because, as I've said before, it can be useful), but that "some don't share data and yet are still doing very good, both on performance, on fraud detection or on replication".

For the rest, you are just saying "my anecdotal observations are better than yours".

But meanwhile, even Terence Tao does not say what you pretend he says, so I'm sure you believe people agree with you, but it does not mean they do.

1 comments

>Rolling eyes.

Please review and adhere to the HN guidelines before replying again.

>why do you invent they are not sharing the data?

Because you advocated that very point. You: "some data is better not to share too" The point in sharing is that I want to interrogate your data/code to see if it's biased or misrepresented or prone to error if it doesn't seem to work for the specialized problem I am trying to apply it to. When you don't share it and your problem doesn't replicate, I'm left wondering "Is it because they have something unique in their dataset that doesn't generalize to my problem?"

>BULLSHIT.

Please review and adhere to the HN guidelines before replying again.

>It's not complicated

You can make this general claim about all papers based on your individual experience? I've already explained why your personal experience is probably not generalizable across all domains.

>you are just saying "my anecdotal observations are better than yours".

No, I'm saying the systematically studied, published, and replicated studies trump your anecdotal claims. I've given you some example authors, if you have an issue with their methods, delineate the problems explicitly rather than sharing weak anecdotes.

> Because you advocated that very point. You: "some data is better not to share too"

SOME data. SOME. You've concluded, incorrectly, that I was pretending that sharing data is not useful all the time, which is not at all what I've said.

> You can make this general claim about all papers based on your individual experience?

What? Do you even understand basic logic? I'm saying that I've observed SOME paper where sharing the code did not help. I'm not saying sharing the code never help (I've said that already). I'm just saying that people usually don't understand the real cause of the problem, and invent that sharing the code will help, while in fact doing other things (for example being more precise in the explanation) will solve the problem without having to pay for the unblinding that sharing the code generate.

Sure, one reason I say that is because of my experience, even if my observations are not at all limited to one field as I've exchanged on the subject with many scientists. But another reason is that when I discuss the subject, the people who overestimate the gain of sharing the code really have difficulties to understand the disadvantages in sharing the code.

Yourself, you seems to not understand what we need for a good replication. Replication is supposed to independently demonstrate, so we build up the confidence in the conclusions. Rerunning with the same data or the same code is not enough, because it does not prove that the conclusions will remain valid if we try with other data or other implementation. When you understand that, only then you understand that sharing the code has a price to pay.

By the way, it will also explain why CERN is doing something that, according to you, has absolutely no reason to exist except for cheating. Of course, if it was the case, intellectually honest scientists would all ask CERN to cancel these policies. They don't, because there are real reasons why scientists may prefer in some case to forbid sharing code and data (not just "I don't do it myself because I'm lazy", but "I don't do it because it's a specific rule, they explicitly say it's a bad thing to do it").

And, sure, maybe it is not everywhere. But it does not matter. It's a counter-example that demonstrates that your hypothesis does not work. If your hypothesis was true, what CERN does would not be possible, it would be obviously a bad move and would be attacked.

> I've given you some example authors, if you have an issue with their methods, delineate the problems explicitly rather than sharing weak anecdotes.

These studies do not conclude that sharing the code is a good solution. None of these studies are in contradiction with what I say.

Of course, from someone who think that saying "some data is better not to share too" and conclude that it means "data is better to never be shared", or that did not understood the point of Tao, I'm sure you are convinced they say that. They just don't.