Hacker News new | ask | show | jobs
by cle 1590 days ago
This is mostly correct, with the additional feature that S3 can efficiently list objects by "key prefix" which helps preserve the illusion.
1 comments

Followup question: Is there something special about the PRE notations in the example output below? I can list objects by any textual prefix, but I can't tell if the PRE (what we think of as folders) is more efficient than just the substring prefix.

Full bucket list, then two text prefix, then an (empty) folder list

  sokoloff@ Downloads % aws s3 ls s3://foo-asdf            
                             PRE bar-folder/
                             PRE baz-folder/
  2022-02-17 09:25:38          0 bar-file-1.txt
  2022-02-17 09:25:42          0 bar-file-2.txt
  2022-02-17 09:25:57          0 baz-file-1.txt
  2022-02-17 09:25:49          0 baz-file-2.txt
  sokoloff@ Downloads % aws s3 ls s3://foo-asdf/ba
                             PRE bar-folder/
                             PRE baz-folder/
  2022-02-17 09:25:38          0 bar-file-1.txt
  2022-02-17 09:25:42          0 bar-file-2.txt
  2022-02-17 09:25:57          0 baz-file-1.txt
  2022-02-17 09:25:49          0 baz-file-2.txt
  sokoloff@ Downloads % aws s3 ls s3://foo-asdf/bar
                             PRE bar-folder/
  2022-02-17 09:25:38          0 bar-file-1.txt
  2022-02-17 09:25:42          0 bar-file-2.txt
  sokoloff@ Downloads % aws s3 ls s3://foo-asdf/bar-folder
                             PRE bar-folder/
I don't understand the answer to that question either. Other AWS docs says you can choose whatever you want for a delimiter, there's nothing special about `/`. So how does that apply to what they say about performance and "prefixes"?

Here is some AWS documentation on it:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...

> For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.

Related to your question, even if we just stick to `/` because it seems safer, does that mean that "foo/bar/baz/1/" and "foo/bar/baz/2/" are two prefixes for the point of these request speed limits? Or does the "prefix" stop at the first "/" and files with these keypaths are both in the same "prefix" "foo/"?

Note there was (according to docs) a change a couple years ago that I think some people haven't caught on to:

> For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes.

Umm... that output seems confusing.

The ListObjects api will omit all objects that share a prefix that ends in the delimiter, and instead put said prefix into the CommonPrefix element, which would be reflected as PRE lines. (So with a delimiter of '/', it basically hides objects in "subfolders", but lists any subfolders that match your partial text in the CommonPrefix element).

By default `aws s3 ls` will not show any objects within a CommonPrefix but simply shows a PRE line for them. The cli does not let you specify a delimiter, it always uses '/'. To actually list all objects you need to use `--recursive`.

The output there would suggest that bucket really did have object names that began with `bar-folder/`, and that last line did not list them out because you did not include the trailing slash. Without the trailing slash it was just listing objects and CommonPrefixes that match the string you specified after the last delimiter in your url. Since only that one common prefix matched, only it was printed.