| Depends who you ask. Traditionally speaking: # Data lake Data is stored en masse with no schema applied, either unstructed or structured data can be dumped straight into the lake or can be transformed and then dumped in. Turns into a data swamp when it becomes unusable due to staleness or complexity. Data lakes are basically an AWS S3 bucket business users can access and (attempt) to do reporting on. # Data warehouse Heavily structured schema applied to data used in reporting, usually defines the single point of truth for business purposes. Uses a star schema model (if you follow Kimball [0] methodology) to create dimension tables used to filter and aggregate raw measurements from the central fact tables (which contain your actual measures like £ made on 1 sale). Kimball and Inmon [1] philosophies come with their own benefits and trade offs. See bottom of [2]. Edit: got methodolgies the wrong way round with initial costs, linked article has a useful table that I didn't see. Data warehouses have a very concrete definition and are usually implemented via Kimball's or Inmon's method. When I've worked with them they've become the bastion of business reporting (excel users love a pivot table). --- Just to confuse matters, there's also the data vault: https://en.m.wikipedia.org/wiki/Data_vault_modeling 0: https://en.m.wikipedia.org/wiki/Ralph_Kimball 1: https://en.m.wikipedia.org/wiki/Bill_Inmon 2: https://www.zentut.com/data-warehouse/kimball-and-inmon-data... |