| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hharrison 4645 days ago
	This is so true. I'm in a Ph.D. program and everyone around me is wasting so much time by reinventing the wheel every time they need to code something. So I spend my time making libraries to help them out, but then I get scolded because that's time that's not going directly toward getting publications. And few people use my code because they don't trust software as up to the scientific standard unless (a) they spent thousands of dollars on it, a la MATLAB, or (b) they wrote it themselves and, e.g., take a mean by manually iterating over an array, "just to make sure" the mean is calculated correctly. Ugh. It doesn't matter how many tests I an point them to. I can't wait to get out of here and work somewhere where coding is appreciated, where I can actually get paid, and where I have some choice as to which state I live in.

6 comments

xtracto 4645 days ago

I hated the 'publish at all costs' attitude I felt while pursuing my PhD and within a post-doc project. IMHO that leads to the huge amount of trash articles and conferences that is now plaguing academia.

link

bluekeybox 4645 days ago

Plus it tends to reward established networks of "friends" who assign each other as coauthors on papers rather than individuals doing the hard part of the work.

link

asabjorn 4645 days ago

I am assuming that you are doing computers science, and in the current environment focusing on the conceptual contribution and do the minimal amount of engineering is solid advice.

I started in physics and there someone could make a great career corroborating for or disproving conceptual contributions. This is not a track in CS and is practically career suicide.

From experience most CS research can not be trusted to be correct, and enabling people to build a career on replicating or corroborating studies would in my opinion be of great value. Even the research that is correct is often not fully implemented so you not only have to implement their approach, you also have to discover how to realize it. That work is not publishable in CS, and it is a non-trivial amount of extremely risky work.

link

hharrison 4645 days ago

Nope, Psychology with a focus on complex systems, statistical physics, dynamical systems, that sort of thing. Everything from time series analyses that require hundreds of thousands of data points to plain old factorial ANOVAs.

Psychology is probably one of the worst sciences for the attitude described in the article. Being in the most "mathy" corner of the field doesn't really help.

link

fluidcruft 4645 days ago

Oh, man. Don't tell them about Kahan summation--they'll freak out and go rewrite everything.

link

ananduri 4644 days ago

I think I'm one of the "them"--now that I know about it, it seems like a pretty important thing to know. I can't help but wonder, how many other gems like this out there that the "them" don't know about?

link

genwin 4645 days ago

Just think, when you get paid your taxes will fund those people re-inventing wheels.

link

hharrison 4645 days ago

Hey, I have no problems with my taxes going toward science. The way it's done is far from perfect but the answer isn't to take away funding.

link

genwin 4645 days ago

Far from perfect, okay. Blatant misuse of funds, not okay.

link

roflc0ptic 4644 days ago

It's not really a blatant misuse of funds, though. My roommate is an intensely bright dude finishing up his math PhD working with studying interactions between complex systems. He writes all his code in C, and he recompiles it every time he wants to change a variable (e.g. the input file, or the number of iterations).

He's been doing it this way for years because that's what he was taught. That's the level of software engineering acumen you'll get in academia. But it "works". I've offered to help him modify the code so it will accept command line arguments, and we're going to sit down and do that so he can run several instances in parallel and utilize all of those fancypants cores on the computer I loaned him, but... he didn't know you could do that. No one told him! How would he know where to start looking that up? How reasonable is it to expect him to grok all that, when he's deep in math-land?

So it was blatant to me, software developer of four years, that something was pretty wrong, but for him: he's about to finish his PhD. He's been published a couple of times. They're not running horribly inept software development, they're running mathematics the best way they know how.

link

temujin 4645 days ago

Yeah, libraries aren't a good way to start because there's not enough interest in using them.

There are opportunities to build standalone tools which blow away their predecessors by multiple orders of magnitude, though; after getting enough researchers to use one such tool, you might attract sustained curiosity from a few people wondering "how the hell did s/he do that?!" and organically grow a small library with a real user base. That's one of my own long term goals, anyway.

link

hharrison 4645 days ago

Well I'm self-taught so I have to start somewhere. I'm not sure I could put together a stand-alone tool and still complete my Ph.D. program. Anyways I've found most stand-alone tools just aren't flexible enough and I don't feel like making something I wouldn't use myself.

link

temujin 4645 days ago

Fair enough, and definitely agree with not making something you wouldn't use. (The "most [existing] stand-alone tools aren't flexible enough" problem is, however, one of the reasons why there's so much room to do better...)

link

hharrison 4645 days ago

True that! Okay, you've convinced me to make it a long-term goal.

link

michaelt 4645 days ago

How would one take a mean of n elements without visiting all n elements? Won't the memory bandwidth and big-O complexity always be the same? Genuinely curious.

link

CamperBob2 4645 days ago

The language used in MATLAB and Octave is designed for vector processing to an extent most developers haven't seen before. MATLAB doesn't mean "Math Laboratory", it means "Matrix Laboratory." Operations on row and column vectors are first-class language elements. You almost never have to manually iterate over an array to compute its statistics -- you'd just say M = mean(A [,dim]) where A is a standalone vector or a column vector of a matrix. In that example, M itself is a vector, if A was a matrix.

MATLAB syntax is ugly but the underlying principles are pretty cool. Well-written code scales automatically on newer hardware, or at least it has the potential to. That's not true in languages where higher-order vectors are built from discrete scalars.

link

joe_the_user 4645 days ago

The good stuff of Matlab must be balanced by it's perverse, pathological and obscene qualities.

The most vile aspect of Matlab is the faith every researcher has that producing something in Matlab is enough when the reality is code coming from Matlab will never escape, will never be as useful nakin-style pseudo for the creation of any larger system.

link

hharrison 4645 days ago

In MATLAB, R, or numpy, it's the difference between `mean(n)` and manually looping. It's not an issue of algorithmic efficiency, it's an issue of lost productivity because they don't even write a function to reuse (all they understand is scripting) they recode the loop every single time they have to sum or take the mean of something.

link

gaius 4644 days ago

Well it is because NumPy and friends do all the heavy-lifting in hand-tuned C. dis() your Python function for taking a mean and see the difference, it's huge.

link

ScottBurson 4645 days ago

The point is not computational time; the point is that one could simply call an existing library function rather than hand-coding the loop oneself and risking making an error (a fencepost error, for example).

link

ivoflipse 4645 days ago

I can understand that you'd want to manually check what's happening. For example taking the mean over the rows of a 2D array using numpy's mean function and aren't really sure whether axis=0 or axis=1 refers to the rows.

But you'd only have to figure it out once and then learn to trust numpy, instead of rolling your own version every time.

link

darkarmani 4645 days ago

You missed these key words: "manually iterating"

So looping in a high-level language rather than using vectorized functions.

link

aidos 4645 days ago

It's probably more in reference to the layer the work is completed in. I haven't used matlab in years but you can probably sum an array by iterating or you can call a faster more efficient library. You get much greater gains when doing this in higher dimensions. If you can do your operations at a matrix level you get a magnitude improvement in speed in most languages.

link

binarysolo 4645 days ago

I think the concern is over the manual component of it, especially if that set of n is big by human standards. (Say, doublechecking a few hundred entries of some column entry by calculator.)

link