Hacker News new | ask | show | jobs
by CyberDildonics 1164 days ago
What is the different architecture and what is the different use case? Databases are already general tools. You put data in and query it. At what point does it become a "data warehouse" ?
2 comments

People argue over the definition of a data warehouse, but usually a data warehouse

(1) optimized for bulk inserts and bulk read operations, not updates or deletes, nor for single row updates

(2) stores data from multiple sources

(3) often a "columnstore" columnar table format is used. This format usually compresses columns in a format that is trivial to decompress (so if a column only contains a handful of similar flags, the database will run-length-encode them to optimize a full column scan), and additionally if you limit the number of columns accessed you can substantially reduce the cost of the query

(4) the data warehouse can easily shard or partition data

If you want to technical file format examples, I suggest looking up how the Parquet file format compresses data, or how Microsoft SQL Server columnstore tables work under-the-hood.

they're both general terms, but serve different purposes. Data Warehouses tend to focus on read-specific operations, more of an emphasis on historical and/or analytical problems than transactional ones. Subsequently they are likely to optimize data ingress, distributed queries and downplay referential integrity and consistency. The different workloads promote different compute models, but you're right that the line is blurry. Similarly you could ask "Excel has multiple tables (sheets) that you can join and query; is it a database"?