Hacker News new | ask | show | jobs
by closeparen 1280 days ago
One Hadoop installation for the company. When you provision a messaging topic or storage instance in production, it's automatically replicated to a corresponding table in the "raw" namespace in the warehouse. Teams can check in Airflow jobs to build modeled/derived tables, downstream of those, as desired. Modeled tables go in team namespaces and teams can set their ACLs. Any tables you have access to, you can select/join in the same Hive or Presto query. It works well - it's kind of mystifying to hear about places with many different data warehouses or federating data between different parts of the company. Big advantages to centralization here.