Hacker News new | ask | show | jobs
by yyin 3698 days ago
Questions:

What language are they using for cleaning up the data for import?

Is there a market for a middleman (i.e., something as a service) that cleans up enterprise data, using customized solutions if necessary, for import into different databases?

If this type of company already exists who are some examples?

This is very unsexy work which I happen to enjoy. Its the phase that routinely comprises 80% or more of any so-called "Big Data" project.

2 comments

There is a company that called CloudFactory that offers a distributed task platform for data science. Data wrangling manpower on-demand.
> If this type of company already exists who are some examples?

Palantir.

Data wrangling -- Trifacta, Tamr, etc. productized this so you don't need a consulting army.
Who are the clients and how much do they pay for these services? Whats the typical size of the dataset? Speed benchmarks shared with the public?