|
|
|
|
|
by kreilly
5247 days ago
|
|
This is pretty spot on with what I've seen up close. Specifically: 1) "Throw [y]our normal engineering practices out of the window." - We treat Data Science very much like the "R" in R&D. We point them generally towards a problem and give them time and latitude to solve it. Trying to fit that into our normal scrum process is impossible. 2) "Data scientists are going to end up building things that need to be translated into production code." - Our hand-off between Data Science and Engineering can be pretty messy. Getting stuff into production efficiently is an ongoing challenge. 3) "Trying to explain some of the hard math that's going on to the entire company isn't a productive use of time." - This point is pretty self explanatory. I get in pretty deep on a regular basis and it gets over my head quick. It can be very hard for the average account manager or marketing person to keep up. |
|
When you are trying to discover answer to a statistical problem it doesn't make much difference if a task takes 30 minutes or 10 hours. This is especially true if the task can be parallelized, then you can just boot up some EC2 nodes and run it. If a task fails one out of five times, it's not a problem because you can just run it again.
However getting this sort of code into production is another challenge, in terms of building the system, dealing with scale, handling edge cases, etc.
Peter Norvig talks about this in one of his talks. At google, they typically start programs in google research and migrate them over to the development teams. One exception is google translate, which they kept in the research division, and brought in engineers to help bring it into production.