Hacker News new | ask | show | jobs
by singhrk 2614 days ago
Currently I am in the same boat. Few things that we have tried/figured out are -

(a) Deprecation - Get your butcher hat on. start looking at existing things, and see how many of these are used by who all. Start deprecating (or at least archiving) the offerings that no body uses (or you are not able to find a user)

(b) Simplification - Try to find the infrastructure components (compute engines, storage frameworks) that serve the same use-case, and see if you can converge into one. For example you can converge from HDF5 and S3 to just S3. Similarly from Hive and Spark to just Spark. Don't bring another infrastructure component in the mix, otherwise someone new in your place will make another HN post in future :)

(c) Documentation - Start building a place to document all the offerings that you have. Some wiki style solution or if it works for you something as simple as google docs. Or it could be some solution like Superset/Redash that is atleast bringing everything at one place

(d) Governance - Get some power users in the system, take their help in (i) identifying important datasets, (ii) adding information about existing datasets, (iii) can review a new code/dataset/production deployment

(e) start checking in Transform code/table DDL, all metadata into some git repository. This will automatically build some documentation overtime and take care of duplicate logic overtime