| I see the motivation and skill of your group, and I hope you can retain that and build a useful contribution. I also see the extraordinary effort required to get to this point, and many projects fail to get this far, so congratulations. Something is obviously going well. However, I am unmoved by your list of key wins (details below). If you indeed built something useful, is there a different way to deliver your message about the functionality that you enable? Here are my reactions: 1) Shorten data discovery phase. In my experience, analysts and data scientists are always very familiar with what relevant data exists, or else they can find the right people to acquire what data they need. Often, kick-off meetings for new projects cover with stakeholders which data is useful. 2) Have transparency on how and by whom the data is used. For publicly available data, this is not something that a company usually cares about. Internal and proprietary data management is already a very mature space, and every company with such data already has processes in place to manage data access. I grant this is often a mess, but I also don't see any global solution on the horizon. 3) Foster data culture by continuous compliance and data quality monitoring. Data quality monitoring is extremely complex. I have seen many claims over many years of tools that solve this problem broadly, but I have yet to see any solution that matches the claims. 4) Accelerate data insights. This is a very bold claim for a new project, especially given the many (5+) decades of work and experience developing tools and techniques for data insights. 5) Know the sources of your dashboards and ad hoc reports. All dashboards I am aware of surface this sort of information. 6) Deprecate outdated objects responsibly by assessing and mitigating the risks. This is a good idea, but it is challenging in practice, as illustrated by several prominent examples [1,2,3]. Finally, unrelated to the above, your project's name (ODD) is very similar to the name used by the Outlier Detection DataSets (ODDS) project [4] Good luck. [1] http://www.lenna.org/editor.html [2] https://scikit-learn.org/stable/modules/generated/sklearn.da... [3] https://deepai.org/dataset/fb15k and https://paperswithcode.com/dataset/fb15k-237 [4] http://odds.cs.stonybrook.edu |
Let me cover some of your reactions from my perspective as a Data Engineer. Please feel free to add your opinion on those
> Shorten data discovery phase. In my experience, analysts and data scientists are always very familiar with what relevant data exists, or else they can find the right people to acquire what data they need. Often, kick-off meetings for new projects cover with stakeholders which data is useful.
You're right, but from my experience it's not always the case. Sometimes finding the key person/team responsible for a dataset might be challenging. You mentioned the kick-off meeting, about which I agree, but it's not always the silver bullet. Data goes outdated/deprecated all the time and we are trying to solve a problem of telling about this to all people which may be affected by this as soon an as easy as possible.
> Know the sources of your dashboards and ad hoc reports. All dashboards I am aware of surface this sort of information
Again, you are right. All dashboard services and BI tools can show you from what data source what data are they getting. But from my experience sometimes it's useful to take a look at the origin of data some dashboard uses. This is where end-to-end lineage comes in hand. Also, I consider useful to have metadata of all of my dashboards from all of my company's BI tools in one place.
> Deprecate outdated objects responsibly by assessing and mitigating the risks. This is a good idea, however, it is challenging
Couldn't agree more. We are working not only to improve our way to solve this problem, but the solution itself, if it makes sense. We are basically trying to find a right approach to this and offer it to everyone else. I know it's ambitious and really is a loud statement, but I hope we are getting there.
In overall, thank you for your input!
@germanosin, would you like to add something I may have missed?