|
Like others have said, the technical infrastructure is usually a manifestation of the people processes of the corporation. I think it's valuable to kinda ignore the technical stuff initially, and instead first understand the requirements of your customers. It's totally possible that the current system, as weird as it is, might satisfy your customers' requirements best. Unlikely, but it's possible. But given that they hired you, chances are they know the current system isn't great, and they didn't possess the domain knowledge to fix it. I'd guess you & your company are aligned high-level that change is needed. It's just a matter of making sure you can align your ideas with the short- and long-term goals of the company, usually with a convincing story explaining how your technical changes drive business value. For example, you mention that data exploration is hard, and I'm inferring this is a problem because you have multiple consultants independently scouring your datasets. If so, you could communicate to your customers that you can reduce consultant onboarding time from 5 days to 2 days (made that up) if you invested in aggregated datasets or a centralized data warehouse. If you can translate this to a dollar figure (like consultant hourly rate), that's even better. As for what part you tackle first, I'd suggest finding a problem everyone knows about, but is straightforward for you to solve. Goal is to display immediate value, and gain the trust of the people around you. You don't solve the systemic problem immediately, but the trust you gain is currency you use months from now to really invest in the system. Because truth is, higher-ups rarely value invisible things like data quality or maintainability, they respond very positively to shiny new graphs and numbers. FWIW I don't know if it's just me, but I feel like the bulk of data science is the ugly pipeline and architectural decisions you're facing now. I read people doing interesting modeling & machine learning work, but I keep wondering how much work went into getting the data into a modeling-ready state. I haven't worked at a company where the % of data team effort going to pipelines is less than, say, 80%. |