Hacker News new | ask | show | jobs
by kreilly 5247 days ago
This is pretty spot on with what I've seen up close. Specifically:

1) "Throw [y]our normal engineering practices out of the window." - We treat Data Science very much like the "R" in R&D. We point them generally towards a problem and give them time and latitude to solve it. Trying to fit that into our normal scrum process is impossible.

2) "Data scientists are going to end up building things that need to be translated into production code." - Our hand-off between Data Science and Engineering can be pretty messy. Getting stuff into production efficiently is an ongoing challenge.

3) "Trying to explain some of the hard math that's going on to the entire company isn't a productive use of time." - This point is pretty self explanatory. I get in pretty deep on a regular basis and it gets over my head quick. It can be very hard for the average account manager or marketing person to keep up.

2 comments

There is a big gap between the disciplines of Data Science and engineering, that makes it difficult to translate the code.

When you are trying to discover answer to a statistical problem it doesn't make much difference if a task takes 30 minutes or 10 hours. This is especially true if the task can be parallelized, then you can just boot up some EC2 nodes and run it. If a task fails one out of five times, it's not a problem because you can just run it again.

However getting this sort of code into production is another challenge, in terms of building the system, dealing with scale, handling edge cases, etc.

Peter Norvig talks about this in one of his talks. At google, they typically start programs in google research and migrate them over to the development teams. One exception is google translate, which they kept in the research division, and brought in engineers to help bring it into production.

yeah, my experience is that big data + agile planning = fail.

in the typical agile process, calendar time is ignored, and people pretend that you can just manage punchclock time. the result of that is that you end up with two days to go to the end of the sprint and a four day job that needs to run.

Where is a two day delay for delivery "fail". There is no incompatibility with agile planning. You don’t go on vacation every time a job runs, you work on another project.