Hacker News new | ask | show | jobs
by karishmakunder 2126 days ago
Yeah, that! Do you use any tool to centralise the data that you use across your Data Science teams? Like a central repository of sorts, so that each one can be given access to it and from there you can start off building your models for training etc?
1 comments

Yes. We ingest all that data onto central Hadoop, where data science team can access all of it in an uniform way. This solves the physical access problem.

Unfortunately, the DQ and meaning of data are harder to solve. They require essentially caretaking of the datasets done by the data owners (cannot be done by a centralized unit). My organization is currently undergoing a transition, where it will be a responsibility of the data owner to maintain the metadata of his/her dataset and also to measure the data quality, but implementing it across the whole org is a journey that will take a long time.