| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IanCal 2914 days ago

I mean all of the following in the best possible way:

Perhaps also worth looking at the screenshot in the blogpost.

You have in there:

---

Datastore

Storage

Filestore

---

So, datastore is not storage, nor is filestore. What is it storage storing if not data or files? Why are files not data? I have no idea what should go where.

> We know folks need to create, read and write large files with low latency. We also know that film studios and production shops are always looking to render movies and create CGI images faster and more efficiently. So alongside our LA region launch,

So I couldn't create, read or write before without low latency? I thought this was already a feature of your other products

> we’re pleased to enable these creative projects by bringing file storage capabilities to GCP for the first time with Cloud Filestore.

For the first time? I couldn't store files before?

I'm not trying to be an arse, but I really don't get from this what the key difference is from everything else you offer.

1 comments

derefr 2913 days ago

> What is it storage storing if not data or files?

Objects. Cloud Storage is the S3 competitor.

> Why are files not data?

“Data” as in rows in a database. Like Dynamo.

Everything on a computer is data. The thing you’ve got to understand is that the terms we use, “objects”, “files”, “data” — these don’t refer to types of data, but rather to access paradigms for data. The semantics of their storage, indexing, mutability, etc.

An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

“Data” is a structured tuple that a database knows how to index into, and sort by the columns of. You insert rows, update columns of rows by a key, or delete rows by a key.

“Files” are seekable streams where you can index anywhere into a file by position and then read(2) or write(2) data at that position, and where other clients can see those updates as soon as you sync(2), without needing to close(2) the file first.

All could be used to implement the other (S3 is implemented in terms of Dynamo rows holding chunks of object data, for example.) But each access semantics has use-cases for which it is an impedance match or mismatch.

IanCal 2913 days ago

Thanks for the explanation, the file/object/data difference makes sense.

> An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

And yet they refer to the objects inside as "Files" and support seeking

https://cloud.google.com/appengine/docs/standard/python/goog...

https://stackoverflow.com/questions/14248333/google-cloud-st...

I know this is just bikeshedding about names and terms but it feels confused.

I think some of the confusion in the list is because of the mix of generic and product naming.

Data can be stored in datastore. But also in "spanner" or "bigtable", which are not parts of "datastore", or in "SQL" which is a language. Object can be stored in the object store called "storage" which is also within an entire category itself called "storage". So there's "Storage" which is a group of all these kinds of stores, and "Storage" which is a very specific type of store.