Hacker News new | ask | show | jobs
by contravariant 1923 days ago
Most people I've met use 'datalake' to refer to a (categorised) collection of otherwise unprocessed data.

A data-warehouse is typically somewhat more structured and doesn't just collect data but also combines and links data from multiple sources. Typically with the goal of creating a set of tables that you can use for reporting without needing to know all the intricate details of how the source-data is linked.

A data-warehouse can be based on a datalake. You could also make a data-warehouse without first building a datalake but keeping the datalake part separate allows for better separation of concerns. You can also have datalake without building a data-warehouse on top of it, it depends on what you want to use it for.