Hacker News new | ask | show | jobs
by zphds 3603 days ago
Going through the SDK docs, why was a scheme like 'http://localhost:8000::people' chosen instead of the plain old 'http://localhost:8000/people'? Are there any benefits? If yes, curious to know what they are.
3 comments

See https://github.com/attic-labs/noms/blob/master/doc/spelling.... -

In this case, we need to be able to address either a database and a dataset. The presence of a :: makes it unambiguous.

But isn't `<database>/<dataset>` more or less similar to `<database>::<dataset>`? The only difference is the choice of a delimiter to disambiguate between a database and a dataset. For me, the first scheme is much more familiar.
Say we did just do <database>/<dataset>. What does the path "http://demo.noms.io/cli-tour/sf-fire-inspections/raw" refer to? Is the database "http://demo.noms.io" and the dataset "cli-tour/sf-fire-inspections/raw"? Is the database "http://demo.noms.io/cli-tour/sf-fire-inspections" and the dataset "raw"?

In our sample data (see https://github.com/attic-labs/noms/blob/master/doc/cli-tour.... for example) we actually have this exact path, and the database is "http://demo.noms.io/cli-tour" and the dataset is "sf-fire-inspections/raw". We need the "::".

Allowing "/" in a dataset name is very convenient (it's common in git branches). Allowing "/" in database names is essential for URLs.

You're just trading one arbitrary thing for another, IMO, but what's worse is you are now abusing the URL specification for the HTTP(S) protocol, so nobody can use existing HTTP URL libraries.

You could easily say everything before either ? or ; always refers to a database, and use a query parameter or a semicolon to delineate a dataset. Or you resource paths:

Address a dataset:

    http://demo.noms.io/?dataset=cli-tour/sf-fire-inspections/raw
    http://demo.noms.io/;cli-tour/sf-fire-inspections/raw
    http://demo.noms.io/dataset/cli-tour/sf-fire-inspections/raw
Address database (catalog):

    http://demo.noms.io/database/cli-tour/sf-fire-inspections
    http://demo.noms.io/catalog/cli-tour/sf-fire-inspections
Address dataset in that database:

    http://demo.noms.io/database/cli-tour/sf-fire-inspections;raw
    http://demo.noms.io/database/cli-tour/sf-fire-inspections?dataset=raw
Why not have the dataset name as a fragment in the URL? For instance:

    http://demo.noms.io/cli-tour#sf-fire-inspections/raw
Glancing over RFC3986 [1], fragment identifiers seem to be pretty much made for what you're trying to communicate with :: - separating a subresource (the dataset) from a primary resource (the database). Unless I'm misunderstanding something?

[1]: https://tools.ietf.org/html/rfc3986#section-3.5

There are issues with using `:` in an URL, if you plan on using the URL in a way that's compatible with the extant software out there. I remember:

- I remember the Rails community trying to use `;` which broke Mongrel 1. Mongrel's parser was generated from the RFC. There was a huge flame war about that back in the day. The Rails core team at the time thought that Mongrel should make an exception to a reserved character. (And after all was said and done, it got changed back to `/` for that particular use-case).

- When working on IPv6 support about 3 years ago, one of the things I added to an open source Ruby project was IPv6 literals into the URL. This was a case of using `:`. Even though this was defined in the RFC specifying the literal, I found out at that time the Ruby standard library was written in a way that assumes you would never have `:` in the URL other than to delimit the port. I ended up having to do some workarounds for that.

That's with Ruby. I wouldn't be surprised if many other extant libraries parsing URLs that might break -- at least not without escaping those characters.

See: https://perishablepress.com/stop-using-unsafe-characters-in-...

You don't NEED ":". You NEED some sort of delimiter that can clearly distinguish between database and dataset; you happen to pick ':' to satisfy that. There might be a different delimiter that works better.

The other option is to not pretend that is a URL and call that something else.

Post-script: I think this project is a great idea. I'm looking forward to see how it turns out.

And just to be clear on this: the `::` might not be a big deal if it happens after the `/` delimiter specifying the host part.

So:

http://localhost:8000::dataset

may break code that tries to discern the host name. However:

http://localhost:8000/::dataset

Might not. Further, you could also reserve `_` in your scheme to refer to the default database:

http://localhost:8000/_::dataset

But as I mentioned in my previous reply, there may be unintended consequences. If this is something you guys want to do (and have HTTP/HTTPS URL compatibility) to check it out on different language/platform and see if your scheme breaks things. (And definitely see if Windows library assumes this; Windows file paths uses `:` as a reserved character)

why break something that's already solved a gazillion times. go open standards, don't create your own.
Java breaks:

groovy -e "new URL('http://localhost:8000::people')" Caught: java.net.MalformedURLException

Python breaks:

>>> urlparse('http://localhost:8000::people') ParseResult(scheme='http', netloc='localhost:8000::people', path='', params='', query='', fragment='')

:: breaks the url for clients / is not supported in the URL specs. Use the fragment or query.
Thanks for the help everyone with this most important aspect of the system ;).

To clarify, we don't think of these specs as URLs. The part before the final double-colon is a URL. To parse one, you get the final double colon, and take everything to the left as a URL.

There's some info on the syntax here:

https://github.com/attic-labs/noms/blob/master/doc/spelling....

Though it's not presented as a formal grammar in that doc, our most important criteria for the syntax was:

  - unambiguousness
  - interacts well with the shell, since we frequently use these as part of command lines
> To clarify, we don't think of these specs as URLs.

But everyone else will because you are including the protocol, and at the end of the day, they are a uniform way of identifying a resource, so they are functionally URIs.

Otherwise, you should probably either conform to the HTTP(S) protocol spec or makeup your own, e.g. noms+http://dbinstance.noms.foo::database/dataset

SQLAlchemy and most DB URIs are good examples on how to do this. For example, you can connect to a MySQL database instance and give it a default namespace/schema/database.

Part of the issue here is the ambiguity between a database, a database instance/server/host, a dataset/table, a catalog/namespace/schema, and what all those words and concepts mean. There's little consensus across fields, because even if computer scientists say "Okay, this is what a dataset actually is", somebody, whether it's a biologist or a physicist, will throw up their arms in protest.

> To clarify, we don't think of these specs as URLs

That makes it a lot clearer. :). Looking forward to take noms for a spin soon.

Perhaps they were writing so much in Go that they set the '/' key to shortcut to '::'.

...but yeah, I am also curious.

What does :: mean in go?
Maybe parent meant :=

Not sure what :: would be.

Nothing, i believe... in Go, at least.