Hacker News new | ask | show | jobs
by derefr 2914 days ago
> What is it storage storing if not data or files?

Objects. Cloud Storage is the S3 competitor.

> Why are files not data?

“Data” as in rows in a database. Like Dynamo.

Everything on a computer is data. The thing you’ve got to understand is that the terms we use, “objects”, “files”, “data” — these don’t refer to types of data, but rather to access paradigms for data. The semantics of their storage, indexing, mutability, etc.

An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

“Data” is a structured tuple that a database knows how to index into, and sort by the columns of. You insert rows, update columns of rows by a key, or delete rows by a key.

“Files” are seekable streams where you can index anywhere into a file by position and then read(2) or write(2) data at that position, and where other clients can see those updates as soon as you sync(2), without needing to close(2) the file first.

All could be used to implement the other (S3 is implemented in terms of Dynamo rows holding chunks of object data, for example.) But each access semantics has use-cases for which it is an impedance match or mismatch.

1 comments

Thanks for the explanation, the file/object/data difference makes sense.

> An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

And yet they refer to the objects inside as "Files" and support seeking

https://cloud.google.com/appengine/docs/standard/python/goog...

https://stackoverflow.com/questions/14248333/google-cloud-st...

I know this is just bikeshedding about names and terms but it feels confused.

I think some of the confusion in the list is because of the mix of generic and product naming.

Data can be stored in datastore. But also in "spanner" or "bigtable", which are not parts of "datastore", or in "SQL" which is a language. Object can be stored in the object store called "storage" which is also within an entire category itself called "storage". So there's "Storage" which is a group of all these kinds of stores, and "Storage" which is a very specific type of store.