| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swasheck 596 days ago
	we use partitioned parquet files in s3. we use a csv in the bucket root to track the files. i’m sure there’s a better way but for now the 2tb of data are stored cheaply and we get fast reads by only reading the partitions we need to read.

1 comments

Incipient 595 days ago

I'm curious how much simpler to build, manage, and run vs cost it would be to simply running a database on a large vultr/DO instance and paying for 2tb of storage?

I feel like you'd get away with the whole thing for around $500/mo depending on how much compute was needed?

link

ramraj07 595 days ago

You just need to try it once to see the issue. Merely loading this amount of data onto a Postgres db will be hell.

link

swasheck 595 days ago

well that's not the infrastructure we have. we are primarily an aws shop so we use the resources available to us in the context of our infrastructure decisions. it would be a hard sell to buy something outside of that ecosystem.

link

Incipient 593 days ago

I understand that's the infrastructure you have. But that's more describing vendor lock-in haha.

Most of my work is with clients that don't have any set infrastructure yet, so was curious if anyone had any anecdotes.

link