|
|
|
|
|
by NoOneNew
2038 days ago
|
|
>The problem is that real world, physical business data is taken from what one can get, not what one would want. Yea, sorry, but part of your job in data sci is to collect the right data. Data doesn't magically exist and we are not stuck with what's out there. A data sci job is to figure this stuff out. Tech has a weird culture of not doing their job. Kind of like the Zip Recruiter ads. "Working as a hiring manager, hiring new people is the worst part of my job." Bitch, that IS your job. If you dont do that, what's the point in keeping you around? Bee keepers collect honey. Yea it's not exactly easy if you're not careful, but they dont bitch about it because they knew what they signed up for. Data sci/analysis is about collecting and analyzing data, in not straightforward ways. Because if it were easy and didnt require any effort, why are they needed? |
|
Some companies aren't experienced at building infra to collect data and don't know how to do it. Or their environment is too complex or expensive to sample data from. The data scientist's job in such cases is to do their best with what exists, show success and make a business case for investing resources into data collection infrastructure.
In other cases, when the required sensors don't exist and the information is critical to decision making, you can either buy the data or work with with an engineering group or external vendor to integrate and build out the sensors needed. Need foot traffic data? You can buy from a data marketplace like https://datarade.ai, where there exist various vendors (like SafeGraph -- which was recently used in a COVID19 study published in Nature) aggregating foot traffic data from cell phones. There are datasets that can be used as inferential proxies (so called "alternative data") for the actual data one needs.
Need to collect in-store data? I was at the NRF conference (the world's largest retail tech conference) in NYC back in January and there were a boatload of vendors hawking different types of retail analytics sensors.
In certain small scale operations, you can even engage field operations and get the in-store retail staff to help collect data and upload manually. (you'll need a good relationship with the field supervisor of course)
Sometimes the data does exist but is inaccessible, say in the ERP or in some proprietary format -- then you have negotiate with certain business groups or with OEM vendors in order to get the data out.
It all boils down to whether the data has value that exceeds (by a margin) the cost of collecting them. If the answer is yes, there's often a way to do it (albeit sometimes imperfectly).
Is it part of the data scientist's job description to create or participate in creating data collection infrastructure? I guess this depends on the company but for many companies the answer is yes.