Hacker News new | ask | show | jobs
by brownbat 3768 days ago
The job market right now in the US forces grad students to attempt innovative research with remarkable conclusions. As a consequence, everyone's trying to prove more outlandish things while few are bothering with replications.

If the field were sane, you would train all the apprentices on replication studies. Once they demonstrated dispassionate expertise with the tools, only then would they be allowed to try to use those tools to test their own ideas, where they will have a strong emotional preference for how the study will come out.

If universities hired grad students based on their replication work, not on their eye-popping original research, we'd have better science and better scientists.

6 comments

That is nearly the opposite of what the PhD was designed to do. Modern academic training comes out of the church and the old guilds of middle Europe and is still in use today in many fields (chefs and plumbers to name a few).

The Bachelor's degree is loosely similar to an apprentice's role. The young boy (they were almost exclusively male) worked in a shop or with a priest for some time. He learned the trade, the tools, and gained some experience from 'level 0'. When you are done with the apprenticeship, you are 'cleared' to work in other shops and are known to not be a total moron or break tools or burn down shops.

The master's degree is just that. You are considered a master of the craft (like plumbing or prinitng) or the discipline (like The Book of Mark or Crusader History). As such, you typically have a master's level project. Something that is 'new' or shows that you know your stuff. That might be a very decorative silver bowl or a thesis.

The Doctorate means you are 'world class.' Not just a mastery in a field, but a paragon of it. Today, that means that you are the expert in your little niche of underwater basket weaving. There should be no-one better than you. This means you MUST have produced something new or novel way of thinking about the God or something. This has always been the idea, if not the practice.

To change that and say that the doctorate should be the bachelor's is very big. To suggest that PhDs should just replicate experiments is anathema to the idea of graduate education and would be a tremendous waste of time and energy. When you enter the Phd, you are assumed to already know how to do all the replication and the facts about the field. Granted, fields are exponentially larger than they were in the 1600's, but you still should know stats and biology if your PhD is in cancer biology.

I think you are totally wrong about this. What you are suggesting should be covered in undergrad and I think it largely is.

The problem is that nobody's doing replication work in undergrad, either, nor does it happen in Master's programs.

I do think it makes sense to stick with the PhD meaning you're a world class expert in some area, but if so then we need to adjust our expectations for what Master's level work means in the sciences. Right now it seems to just represent a hurdle you need to whiz past on your way to the PhD.

A good Bachelors degree will prepare a good student for replicating science, and a good Masters degree will definitely leave a motivated and skilled student with a good advisor a master of his or her specific field.

The problem is that grade inflation means the majority of students will fall short of these goalposts. I agree with your assessment that undergrad degrees represent hurdles, regardless of whether a student is planning to stay in academia or not.

My experience with getting a Masters degree was that it was really tough work that required my full dedication for two years. But I had a world-class scientist as an advisor breathing down my neck the whole time and expecting results, and my experience doesn't seem to match that of many other MScs I know. Some departments seem to be "degree factories"; it takes an unreasonable amount of effort to follow up students in the classical "apprenticeship" tradition described by GP. It would be very strange if every department at every university managed this level of dedication, with student numbers being what they are.

> a good Masters degree will definitely leave a motivated and skilled student with a good advisor a master of his or her specific field.

I'm having trouble believing this.

In UK, most Masters degrees last 1 year, and there are several degrees considered good such as Imperial's MSc in Machine Learning, Cambridge's MPhil in Machine Learning, Speech and Language Technology, Edinburgh's MSc in Cognitive Science, and others. Is it really possible to become a "master" of machine learning in one year?

Also, at least in Computer Science, most of the Bachelors degrees considered good in UK do not seem to focus at all on replicating science. In fact, for my final year undergraduate project, I was encouraged to find something novel, and at no point my supervisor hinted towards focusing on replicability.

Is it perhaps more common in US?

An MSc is not an MSc. It can commonly vary from 1-3 years, depending on institution. I was thinking of the two-year variety.

Of course, you can define the term "master" to mean pretty much whatever you like. But I'd say that two years of additional, focused study when you are already proficient in your field should be more than enough to have a mastery of the specific skills and knowledge that is at least on a high national level. I'm from Norway, so the US picture is unknown to me.

Its really dependent on the department, but yeah, the MS is now the HS degree for a lot of places. Especially in Engineering. Good luck trying to get hired with only a BS. Credential creep is real. More here: https://en.wikipedia.org/wiki/Credentialism_and_educational_...
In CS, a Bachelors is enough for most (if not all) jobs.
Yeah, the only job I've found where that's not true is something like 'head researcher' or other similar thing (like "data scientist").
Everyone's doing replication work in undergrad, it just happens to be replication of the "highlights" or most important results. Farmed out the replication of every result to undergrads would, one be a practical/logistical nightmare, and two lead to less reliability in what a degree means. One student could spend their time working on a important result gaining tons of insight and expertise while another could be stuck replicating a task that turned out to be worthless bullshit and have very little to show for it.
> Farmed out the replication of every result to undergrads would, one be a practical/logistical nightmare, and two lead to less reliability in what a degree means.

Actually I think farming out replication to undergrads would be an excellent approach. Your final year undergrad project should be to choose an under-replicated study and repeat it, publishing your findings. Each individual study might be less reliable if done by an undergrad than done by a seasoned researcher, but if each study is repeated by say 5 undergrads and 2+ of them fail to replicate the results, that would be enough to indicate that the study warrants further attention.

> One student could spend their time working on a important result gaining tons of insight and expertise while another could be stuck replicating a task that turned out to be worthless bullshit and have very little to show for it.

The whole point of science is that we don't know what will turn out to be an important result and what will turn out to be worthless bullshit. No study is worth a damn unless it's been replicated but everyone is too busy trying to land-grab the next little piece of unexplored territory to actually validate anything that comes before.

If nothing else, we need to regain the perception that a negative result is just as important as a positive result - to paraphrase Edison, discovering 100 things that don't work is just as important as discovering one thing that does.

Mainly because no one is willing to fund it. (Don't look at me, I haven't funded any of it either!)
In an ideal world you might have a point.

However, PHD students on average are very far from world class. And in just about every case they are simply looking at a problem so unimportant that nobody considered it before, and most likely nobody will ever look at again.

It's almost always a waste of time for both the student and everyone else involved.

PS: There are plenty of counter examples where PHD research happened to be valuable, but that's a tiny minority of cases.

You could require both reproduction & novel work for the PhD. Or you could require reproduction for the MS, as a stepping-stone to novel work.

Today the MS & PhD are both supposed to prepare you to do novel research & science. Since reproducibility is a core part of research & science, it would make sense for you to reproduce another study as part of your learning process.

History isn't a useful guide when our current system is broken, it only reiterates what we've been doing wrong all along. If you really think replication is a waste of time though, then you won't understand why journals are filled with junk science.

We used to think placebo controls and double blinds were a waste of time. One great thing about the history of science your version excludes is that the way we do science is subject to review too, and we continually throw out what doesn't work in favor of methods that do.

Academia has always forced grad students to attempt innovative research. The core requirement of a PhD dissertation is "novel findings"—e.g., something completely new.

Apprentices work on papers and research for their professors. No one is incentivized to do replication work: either you replicate it successfully (great) or you find flaws. In the latter case, you'll probably be a nit-picker anyway—also not really additive. On the off-chance that you refute a high-profile study (e.g., some of the outright instances of fraud), you might get some recognition, but now your name is associated with something negative (e.g., fraud: "X is not true") vs. positive ("X is true").

Finally, this is some of the role of review panels in journals: too pass the burden of proof.

A lot of the time those same students are doing the bulk of the legwork for those findings then attributed to a professor, who is under greater pressure to advance (not confirm) science, who may or may not have even participated heavily in the grunt work of that research. The professor then punishes the work under his/her name, with no credit to the work of the students.
> A lot of the time those same students are doing the bulk of the legwork for those findings then attributed to a professor, who is under greater pressure to advance (not confirm) science, who may or may not have even participated heavily in the grunt work of that research. The professor then punishes the work under his/her name, with no credit to the work of the students.

It's important to note that though this seems to happen in all fields, it is far less common in some. I rarely hear of such things in astronomy; it does happen, but more often I hear about faculty explicitly working to ensure they can protect projects for their students so their students can get the credit deserved for doing the project. I hear of professors appropriating student research in biology and chemistry more frequently, however.

The occurrences of professors taking and publishing student research is certainly a problem, is unethical, and should be stopped. But the way in which it is usually discussed ("in science") implies that it's a systemic issue across all science, which (from my experience) isn't true. This topic of credit and attribution for research deserves more nuanced discussion and fewer blanket statements.

Edit: Spelling, wording clarification.

I think it does vary a lot by discipline. In my area of mathematics I don't recall ever hearing of it. For whatever it is worth I even published papers without my supervisor being on them at all, if he hadn't been involved in that work.
How does attribution and second/third authorship work if you "just" bounced ideas off your supervisor or fellow students in verbal discussions? I'm trying to gauge how strict attribution rules are in what I would consider a gray area.
In mathematics, multiple authors are always listed alphabetically by family name; ie there is no "first" or "second" author.

In the situation you describe, the paper would probably be singly authored, and the author would write something like "Thanks to my advisor _____ and to my colleagues ____ for many helpful discussions" in an acknowledgments section.

If the contributions were more serious, then possibly the author would invite the others to co-author with him/her, and the others would then either accept (and then help write the paper) or politely decline and say "just mention me in the acknowledgments".

At least in my experience, co-authorship carries responsibilities: to help with the writing, the figures, references, dealing with editors and with submission to journals, speaking about the research at seminars/conferences, etc.

In life sciences, the last author is almost always the person whose grants paid for most of the research. Usually, they also helped supervise the research, but the grant aspect is more important.

For example: Mike Synder, a brilliant biologist, 'supervises' 36 postdocs, 13 research assistants, 11 research scientists, 9 visiting scientists, and 8 graduate students (http://snyderlab.stanford.edu/members3.html - thanks to Lior Pacter for noticing it).

In 2014, he had 42 published papers. How much scientific input do you think he had on each one?

>In mathematics, multiple authors are always listed alphabetically by family name; ie there is no "first" or "second" author.

Is there any research showing this standard to be fair? People pay more attention to the first item of a list than to the middle ones, making me think that one's position on such a list could have a small benefit. Less important if there are overall less multiple name papers, which it seems from the rest of your comment, but still a factor as long as multiple names on a paper do happen.

I was thinking the same thing. I think math lends itself to this because the work is so esoteric that often the advisor doesn't even completely understand the work, much less can claim credit for it.
I think the size of labs is more important. Bio and Chem labs -- especially the best ones -- are often freaking huge. There's no way the PI actually has time to meaningfully contribute to every paper when they are bringing in enough money to support dozens of full-time positions.

In math, the group sizes are way smaller. A PI might only have 1 or 2 students.

I wonder if it correlates to how easily new discoveries can be monetized.

Astronomy is too far to reach, mathematics is too abstract to patent, etc.

Publishes. I assume that was a typo?
No doubt a typo. Punishment is the consequence of not publishing. :)
If the field was sane we would not expect breakthrough results or any results all of the time. We would happily say to scientists, thanks for your continued efforts... and allow them the money and time to do real science. But this is not how the crazy Capitalist world works. You have sociopaths which run these institutions always barking orders and demanding results. So the poor employee gives you what you ask, and you say good boy, and give them a treat.
That is a bold claim and conclusion, what data are you basing that on?
If the work you are doing isn't reproducible, then it won't be cited as the building blocks for future work. The system is cleans itself out as a side effect. Therefore I prefer students to work on cutting edge research.
The point is that nobody knows if it's reproducible, because nobody's attempting to reproduce it.
Yes that may be true. Instead of trying to reproduce it, they attempt to build upon it. The projects which successfully build on previous projects become original research articles and the cycle continues.
No. That completely misrepresents the problem.

tl;dr A false technique can be described and it can be hard or impossible to detect the technique is flawed by using it

For example the paper at http://www.jstor.org/stable/222500 describes a method of using the stationary bootstrap to eliminate data snooping bias in studies of "technical analysis" in finance.

Published in the Journal of Finance in 1999 at the time I worked with it in 2010 it had been cited over 500 times.

The proof in the paper is inscrutable. I could find no one who could explain or verify the proofs at my institution.

I attempted to reproduce the results in the paper which is where the problems started. The authors did not give enough information to do this in all cases, but in some cases I was able to reconstruct the algorithms.

They did not perform as described. Of the five algorithms I could reproduce (from memory) one of them worked roughly as described and two others were not completely hopeless.

Looking more closely I realised that the original authors completely disregarded important factors in the implementation of the techniques described by the algorithms. Transaction costs. Allowing for transaction costs (a difficult but not impossible task) the effects noticed by the authors disappeared completely.

Looking even closer I examined the assumptions behind "White's Reality Check" that the paper relied on the Stationary Bootstrap by Politis and Romano from 1994. But financial returns are not stationary. Not at all.

So dodgy logic, misuse of statistics, irreproducible experiments, ignoring important aspects of the data and I suspect wishful thinking add up to a paper that is comprehensively false. Cited hundreds of times and used many times to verify other results.

taking a huge risk here is where my research is: https://ourarchive.otago.ac.nz/bitstream/handle/10523/4346/T... Section 5.12.1 "Bootstrapping and Other False Discoveries"

Which exactly proof are you talking about? I briefly looked at the paper (I've seen it before, but it's been quite a while...), but it seems that they pretty much use a previously known approach, and the only proof in it is simply "replicated for convenience of the reader". Also, this would certainly be neither the first nor the last paper that ignores transaction costs, and their omission does not really invalidate the argument (even if you cannot profitably trade on an anomaly, why is it there in the first place?), so I don't think you can accuse them of bullshit just based on that.

Non-stationarity is a problem though. Still, you need a bit more to call complete bullshit on this imo -- e.g. say something like "after switching to a different bootstrap method that works in presence of stochastic volatility the result suddenly disappears". Perhaps that is what you do in your paper :) I should take a look.

All that being said, not many people believe in technical indicators working in equities these days, or anywhere really (FX was a bit of a holdout -- not sure if it still is?), so perhaps science has kinda sorted itself out in this case :) I've seen far worse cases of snooping and non-replicability though, and some are still going strong.

Non-stationarity is enough to call bullshit. Really! How can a stationary bootstrap be used on data (financial returns) that are so prone to non-stationarity?

Irreducibility is also enough to call it very bad and should not have been published in that form. They talked a lot about their algorithms, without properly describing them.

Ignoring transaction costs is also enough to call bullshit. It is a mistake that should only be made by rank amateurs, and it is the most common mistake made by amateurs in the Technical Analysis field IMO.

It is a very very bad paper but because it gives a technique that can be used to show that TA is possible it is much beloved by researches in the field.

My own conclusion is that (generally) TA is not possible to do profitably at these time scales.

> Yes that may be true. Instead of trying to reproduce it, they attempt to build upon it. The projects which successfully build on previous projects become original research articles and the cycle continues.

Whether this "works" or not depends on how the previous projects enter into it. If the results of previous projects are used as assumptions to justify the methods, data, etc. of the subsequent project, there is no check and we risk the research becoming a house of cards which could collapse due to faultly, untested assumptions that were used.

If the subsequent projects are performed in such a way that they also test the previous results/assumption, this can be avoided. I can't tell from your wording which you are suggesting, though it seems to lean towards the former.