|
|
|
|
|
by glogla
1608 days ago
|
|
I skimmed the article and it seems interesting. On the data side, they have ~7.5 billion total records and they add in 55 million new a day. On the web side, they have ~1 million daily unique users and 100k concurrent users at peak ("concurrent" means "in one minute" is seems). I'm no expert on the web part, but I'm kind of curious why they went with the design they did for the data part. The design, and the chosen technologies make me think they treated it more like a normal web app, not like a dashboard. I would expect OLAP database, not a sharded Postgres, and the data model feels very OLTP to me as well. Or maybe is that because it's mostly time series and not traditional data model? I'll have to go through the article in more detail. |
|
OLTP stores are relatively bad at aggregating across a lot of data.
Analytics dashboards with many users, a lot of ever-changing data, and many different views exist in a gray area between OLAP and OLTP often referred to as real-time analytics or operational analytics. The queries are usually somewhat lighter / less ad-hoc / more indexed than in OLAP, but there can be hundreds or thousands of them per second with different filters and aggregations.
There are some specialized real-time analytics databases like Druid. Citus (used in the article) allows you to run such workloads at scale on PostgreSQL.