Hacker News new | ask | show | jobs
by zopf 3451 days ago
First off: isn't this from June of 2016?

Second: I don't get it. The primary example they use to illustrate the need has almost nothing to do with model building or selection, and everything to do with selecting and painstakingly cleaning data. This mirrors my experience with data science so far.

"A recent exercise conducted by researchers from New York University illustrated the problem. The goal was to model traffic flows as a function of time, weather and location for each block in downtown Manhattan, and then use that model to conduct “what-if” simulations of various ride-sharing scenarios and project the likely effects of those ride-sharing variants on congestion. The team managed to make the model, but it required about 30 person-months of NYU data scientists’ time and more than 60 person-months of preparatory effort to explore, clean and regularize several urban data sets, including statistics about local crime, schools, subway systems, parks, noise, taxis, and restaurants."

So - the meta part isn't such a big deal. But if DARPA has found a way to properly automate the painstaking process of selecting, cleaning, validating, and normalizing data, well THEN we'll really have something to be impressed about.