Hacker News new | ask | show | jobs
by adamsmith143 1302 days ago
On the flip side you used to have statisticians writing code that is frankly unusable in a Production environment. You would weep at the R code I've seen and had to turn into something to actually produce business value.
4 comments

There is a bit of a joke that a data scientist is someone who can do better stats then the average SWE and can write better code than the average statistician. Both of those are relatively low bars to clear though
The way I heard the joke was "a data scientist is someone who's not good enough at math to be a statistician, and not good enough at programming to be a software engineer."

Maybe a little harsh...

That's much better. Consider that stolen.
Harsh, but funnier than how I phrased it.
This is exactly my point. Let subject matter experts in their respective disciplines handle what they know and communicate through the lingua franca of R. Most data scientists/statisticians probably shouldn't be writing production code, I think that's ok. It's a failing of management to think that coding is coding and not understand the value of true engineering ability.
My first job basically consisted of taking code in FORTRAN and translating it into C++ with robust testing and engineering, and then frontending that code into a ton of spreadsheet packages. So you had quanta doing quant work, software engineers doing software engineering, and analysts and traders being analysts and traders, instead of having quants fail at all three, which is more or less what data science is.
Yeah but in the end it’s just code. And even better, just R.

The business value comes from the stats guy.

When the R/stats guy quits and you have to figure out which of his 7 notebooks to run in which order and which local files need to be in which local directories to run correctly and which versions of each package are now broken and which code you need to rewrite to fix it you start to realize the value he produced was clicking a lot of buttons in the right order and that overall this doesn't scale at all.
Yeah, but I meant that because the business value is in the stats, and there is such low quality of stats in the field to begin with, it’s borked no matter what.

There’s no point in fixing it. You can just pretend like you did. But if the stat work is quality, then it’s worth the effort to optimize.

That sounds more like a jupyter notebook/python problem than an R problem.

but otherwise, yes, I see the problem.

The hours I have spent debugging package problems in R would disagree.
I know that pain. That’s why I’m saying avoid it if you can do so.