Hacker News new | ask | show | jobs
by artnep 3905 days ago
Yes, but I also worry about a bunch of theorists writing substandard code that is unreadable and unmaintainable.
1 comments

And creating "fragile" models because they don't have the tools to reproduce their own experiments. How many authors of academic papers in ML could reproduce the exact same results a year later? I would guess around 10%.
This pisses me off so much. I'm not a mathematician, but I like to think I'm a pretty good programmer. I feel like I could pick up a mathematical concept described in a computer science paper more easily if I could actually see the damn code and run it myself. But most of the papers I've read haven't mentioned where to find the referenced source code or, if they do, it's either horribly written and only runs on the author's machine or it requires specialized software that only a university could afford.
From my interactions with researchers in ML, most of them are actually pretty good programmers. There just isn't an incentive to make your code clean:

1. There isn't much correlation between quantity or even quality of papers you publish and the quality of your code. Meaning, writing cleaner code is not going to help you get that postdoc or faculty position.

2. Doing research is full of stops and starts and branches that fail and approaches that get thrown out. It's a waste of time to write clean code since you know it'll most likely be thrown out. When you do get an approach that works, you publish your paper and move on.

> most of them are actually pretty good programmers.

What is 'good'? In 'software development' 'good' is usually connected to writing clear, maintainable, test covered code. In most scientific research it means something completely different. I think on HN most adhere to the former definition of good and in that sense most researchers (especially in physics, but also CS / ML) are not 'good' according to that definition (because you need quite a lot of years of experience in a corporate setting usually) and actually even bad. But the code works and implements the concepts in their papers so they are 'good' in that respect. That is more rapid prototyping to make a POC to show it works, after which you properly rewrite it.

Good = they are capable of "writing clear, maintainable, test covered code" if they wanted to.
Maybe you can expect more citations if other researchers can examine your code.