Making a math/science person versatile in CS is somewhat easier, but even that can be tricky. Many of them are bored by file formats, architecture, etc, and simply don't have the mindset of of engineering.
Very hard. You run into all types of candidates who just aren't there yet: people working on research that's irrelevant to real world applications, people who have done data analysis/BI work that brand themselves as "data scientists," those who have the pedigree but cannot process and explore real-world data, those who have good analytical chops but not the distributed or advanced modeling experience, etc.
I've witnessed it first-hand, and it's tough to find the right person.
If it is that hard the bar is probably set too high. Most of the skills are learned on the job after all. Most smart PhDs who can program well and have sound knowledge of statistics can learn to do this stuff.
Given enough time, anyone smart enough to finish a PhD can acquire a set of skills. :)
But it's more than just solid statistics. We're talking about having enough mathematical fluency to develop models rigorously (not just "oh, we'll minimize MSE!!"), test those models, then implement those models--possibly using a distributed algorithm.
From what I hear, these skills take years to develop. Choosing to groom the wrong person is an extremely costly mistake, so making the choice is difficult.
All mathematics consists of rigorous models. But choosing and tweaking a model is more of an art. Most data scientists apply existing models to new data, they do not develop new ones.
I am sure it takes much less than "years" for any smart PhD in applied mathematics to learn most of data analysis tricks. It is not theoretical physics after all.
Most data scientists apply existing models to new data, they do not develop new ones.
I meant "develop" in the software sense. Data scientists use off-the-shelf libraries during initial research, but those libraries usually lack an important feature preventing them from going into production (typically, no support for concurrency).
I am sure it takes much less than "years" ... to learn most of data analysis tricks.
I used to be cynical about "data science," too. After four months of working on a data science team, though, I'm a believer.
A data scientist is really a "full-stack data developer." He or she needs the ability to work with advanced models, use them to analyze large amounts of data, and modify those models to work concurrently or in a distributed system if desired (and its often desired). It's more than just "analysis tricks."
See Zed Shaw's seminal article "Programmers Need To Learn Statistics Or I Will Kill Them All".
http://www.zedshaw.com/essays/programmer_stats.html
Making a math/science person versatile in CS is somewhat easier, but even that can be tricky. Many of them are bored by file formats, architecture, etc, and simply don't have the mindset of of engineering.