| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thrmsforbfast 2785 days ago
	Open source doesn't ensure quality code. Ideally the code would be part of the peer review process, but code review is really expensive, so who knows how that would play out.

6 comments

ChrisFoster 2785 days ago

True, but it does provide at least some measure of reproducibility. Quality of implementation and reproducibility are orthogonal and both very valuable in their own right.

link

shkkmo 2785 days ago

> Open source doesn't ensure quality code.

Yes, but closed source helps ensure that low quality code is hidden from sight. It also means that people who distrust or doubt the conclusions have no chance to identify any bug(s) and disprove the results or conclusions.

link

fifnir 2785 days ago

It's simple:

We stop publishing in papers, and instead adopt smaller chunks of our work as the core publishing units.

Each figure should be an individually published entity which contains the entire computational pipeline.

Figures are our observations on which we apply logic/philosophy/whatyouwannacallit. Publishing them alongside their relevant code makes the process transparent, reproducible and individually reviewable, as it should be.

We can then "publish" comments, observations, conclusions etc on those Figures as a separate thing. Now the logic of the conclusions can be reviewed separately from the statistics and code of the figure.

link

chiefalchemist 2785 days ago

A comparable solution would be for all involved to value all research, not just the ground breaking, earth shattering type.

As it is, research that yields a "failure" is buried. That means wheels are being reinvented and re-failed. That means there's no opportunity to compare similar "failures", be inspired, and come up with the magic that others overlooked.

Unfortunately, I would imagine, even if you can get researchers to agree to this the lawyers are going to have a shit fit. Imagine Google using an IBM "failure" for something truly innovative.

link

tokai 2785 days ago

What you are proposing sounds a lot like the concept of the least publishable unit.

https://en.wikipedia.org/wiki/Least_publishable_unit

link

jpeloquin 2784 days ago

> Each figure should be an individually published entity which contains the entire computational pipeline.

I agree in principle. But, for the experimental sciences, we need better publication infrastructure to make this practically possible.

For example, consider a figure that summarizes compares, between several groups, the mechanical strain of tensile test specimens for a given load. Strain is measured from digital image correlation of video of the test. Some pain points:

1. There is a few hundred GB of test video underlying the figure. Where should the author put this where it will remain publicly accessible for the useful lifetime of the paper? How long should it remain accessible, anyway? The scientific record is ostensibly permanent, but relying on authors to personally maintain cloud hosting accounts for data distribution will seldom provide more than a couple years' of data availability.

2. Open data hosts that aim for permanent archival of scientific data do exist (e.g., the Open Science Framework), but their infrastructure is a poor match with reproducible practices. I haven't found an open data host that both accepts uploads via git + git annex or git + git LFS and has permissive repository size limits. Often the provided file upload tool can't even handle folders, requiring all files to be uploaded individually. Publishing open data usually requires reorganizing it to according to the data host's worldview or publishing a subset of the data, which breaks the existing computational analysis pipeline.

3. Proprietary software was used in the analysis pipeline. The particular version of the software that was used is no longer sold. It's unclear how someone without the software license would reproduce the analysis.

Finally, there's the issue of computational literacy of scientists. In most cases, the "computational pipeline" is a grad student clicking through a GUI a couple hundred times, and occasionally copying the results into an MS Office document for publication. No version control. Generally, an interactive analysis session cannot be stored and reproduced later. How do we change this? Can we make version control (including of large binary files) user-friendly enough that non-programmers will use it? And make it easy to update Word / PowerPoint documents from the data analysis pipeline instead of relying on copy & paste?

If any of these pain points are in fact solved and my information is out of date, I would be thrilled to hear it.

link

fifnir 2784 days ago

1 ans 2: I like IPFS for this, check it out

3: analysis that uses propriatory is marked appropriately as second class

> computational literacy of scientists

Welp...

link

no_identd 2785 days ago

I have two words for you: Ted. Nelson.

link

j88439h84 2785 days ago

Can you expand on this?

link

lvh 2785 days ago

I can’t speak for GP, but Nelson invented hypermedia/hyperlinks and had a vision for the future that included documents including other documents. All of that seems pretty compatible.

link

agumonkey 2784 days ago

similar to reproducible builds or nix

research just jumped onto jupyter notebooks, it's halfway there, someone helps the remaining step

link

d0mine 2785 days ago

www was created to publish information in CERN but we can use it in other contexts too ;) http://info.cern.ch/Proposal.html

link

lvh 2785 days ago

Of course it won’t ensure anything, but currently being completely unable to reproduce results, even as the author but just a year from now, is par for the course.

link

darpa_escapee 2785 days ago

It's not about code quality, it's about transparency and ease of reproduction.

link

BurningFrog 2785 days ago

Code review is cheap. I do it for fun. But it doesn't prove anything.

Science should prove things...

link

m_mueller 2785 days ago

science can never prove anything as a matter of principle. it can only disprove all the alternatives. math and logic can prove, but only within the model it has built up, which has been shown to contain unprovable axioms that one must simply accept.

link

BurningFrog 2785 days ago

Yeah, I'm aware of the strict theory.

link

Fomite 2785 days ago

Little of what I do, even with the most rigorous methods available and the best practices from both software development and computational science, proves anything.

link

BurningFrog 2785 days ago

I know. And I think it's a problem for science...

Logical proofs will never happen for software development, but surely standards for scientific programming can be tightened up a few levels!

I think I heard of some reform proposals from the Reproducibility Crisis reformers.

link

Fomite 2785 days ago

I more mean there are whole aspects of science that aren't provable without being able to actually obtain counterfactuals, and that means time machines

link