Hacker News new | ask | show | jobs
by aaronjg 5253 days ago
There is a big gap between the disciplines of Data Science and engineering, that makes it difficult to translate the code.

When you are trying to discover answer to a statistical problem it doesn't make much difference if a task takes 30 minutes or 10 hours. This is especially true if the task can be parallelized, then you can just boot up some EC2 nodes and run it. If a task fails one out of five times, it's not a problem because you can just run it again.

However getting this sort of code into production is another challenge, in terms of building the system, dealing with scale, handling edge cases, etc.

Peter Norvig talks about this in one of his talks. At google, they typically start programs in google research and migrate them over to the development teams. One exception is google translate, which they kept in the research division, and brought in engineers to help bring it into production.