| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrbonner 817 days ago
	My cluster is clocked in at 230TB in Aurora. It is hitting a hard limit of 250TB AWS can support.

2 comments

mrbonner 817 days ago

No, we do not store log or IoT. The data are all business related metrics. I didn't choose aurora but inherited from another team. We have 4 replication reads to scale out the read access. The internal team owns the ingestion (insert) to the write node. All other external accesses are read.

I think the reason behind aurora pick is to support arbitrary aggregation, filtering and low latency read (p90 < 3000ms). We could not pick distributed DB based on Presto, Athena or Redshift mainly for latency requirements.

The other contender I consider is Elastic search. But, I do think using it in this case is akin to fitting a square peg in round hole saying.

link

LunaSea 817 days ago

Being curious I was wondering what type of applications could generate this quantity of data.

Is it IoT / remote sensing related?

link

manquer 817 days ago

You are thinking of normalized ( bcnf if not 3nf) well architectures application storing structured data , unless the app is 100 million+ users or grew super fast 250TB size would be hard to get to .

Timeseries (like IoT you mentioned ) or binary blobs or logs or any other data in SQL storage that shouldn’t be really there can hit any size wouldn’t be all that interesting.

Can’t speak for OP, however managing data for few million user apps, what I have observed is most SQL stores hit single TB range and then start getting broken down into smaller dbs either coz now teams have grown want their own Micro-service or DB or infra wants easier to handle in variety of ways including Backup /recovery larger DBs are extremely difficult to get reasonable RTO/RPO numbers for.

link

SJC_Hacker 817 days ago

If you want to store video data as BLOBs in a DB, you can get there easily.

Maybe not the best idea, I guess a file system would be better for that, and just use the DB for metadata.

But OTOH all the data is one place, so you just migrate the DB. Less to worry about.

I just looked up, all of English Wikipedia (including images) is barely even 100 GB ... crazy world we live in.

link

manquer 817 days ago

You wouldn’t say less to worry about when you have to do full backup or show recovery from backup within a set recovery time .

This one data store is easier is a myth , it just offloads complexity from developer to infra teams who are now provisioning premium NVMe storage instead of cold object stores for binary data .

Binary data is not indexed or aggregated in a SQL store there is no value in doing this is one place expect dev experience at the cost of infra team experience.

link

wavemode 817 days ago

Super easy to generate arbitrary amounts of data if you start using postgres as a log, of any sort.

I worked for a company that had only a few thousand active customers yet had dozens of terabytes of data, for this reason.

link