Hacker News new | ask | show | jobs
by wackget 1589 days ago
As a web developer who has never used anything except locally-hosted databases, can someone explain what kind of system actually produces billions or trillions of files which each need to be individually stored in a low-latency environment?

And couldn't that data be stored in an actual database?

4 comments

Things like mobile/webisite analytics events. User A clicked this menu item, User B viewed this images etc All streamed into S3 in chunks of smallish files.

It's cheaper to store them in S3 over a DB and use tools like Athena or Redshift spectrum to query.

Wow. What makes it cheaper than using a DB? Is it just because the DB will create some additional metadata about each stored row or something?
S3 is often essentially a database in these scenarios. You store columnar data format files in S3, and various analytical systems can query with S3 as a massive backing storage.

    And couldn't that data be stored in an actual database? 
This is the "it's turtles all the way down" concept. A database is just going to store data in the file system, plus some extra overhead. Putting data in a database saves you nothing unless you actually need the extra functionality a database provides.

That overhead doesn't mean much if you have 10 users and 1gb of data. But it adds up in very large systems.

An image service.
Yeah that use-case I get. Binary files which would be difficult/impractical to index in a database.

However it feels like something at that scale will only ever realistically be dealt with by enterprise-level software, and I'd hazard a guess that most developers - even those reading HN - are not working on enterprise-level systems.

So I'm wondering what "regular devs" are using cloud buckets for at such a scale over regular DBs.

My company gets sensor data from millions of devices and records. Happens all day, all around the word. It adds up. If you don't delete that data, it becomes petabytes. Thanks god GDPR et al exist so we have a good excuse to "need to delete this data boss".