| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zabzonk 121 days ago
	Not clear if the author realises that all commercial SQL database engines support querying of the database's metadata using SQL. Or maybe I have misunderstood - I only skimmed the article.

3 comments

da_chicken 121 days ago

Yeah, this seemed like a very long way to say, "Our RDBMS has system catalogs," as if it's 1987.

But then, they're also doing JOINs with the USING clause, which seems like one of those things that everybody tries... until they hit one of the several reasons not to use them, and then they go back to the ON clause which is explicit and concrete and works great in all cases.

Personally, I'd like to hear more about the claims made about Snowflake IDs.

link

zabzonk 121 days ago

> doing JOINs with the USING clause

I'm ashamed to say that despite using SQL from the late 1980s, and as someone that likes reading manuals and text books, I'd never come across USING. Probably a bit late for me now to use it (or not) :-(

link

tkejser 121 days ago

I didn't really write USING in anger until around 10 years ago, and I have been around a long time too.

Not all databases support it. But once you start using it (pun) - a lot of naming conventions snap into place.

It has some funky semantics you should be aware of. Consider this:

  CREATE TABLE foo (x INT);

  CREATE TABLE bar (x INT);

  SELECT \* FROM foo JOIN bar USING (x);

There is only one `x` in the above `SELECT *` - the automatically disambiguated one. Which is typically want you want.

link

n4r9 121 days ago

I've used SQL for around a decade and also never came across it. I'm maintaining SQL code with hundreds if not thousands of basic primary key joins and this could make those queries way more concise. Now I want to know the reasons for not using USING!

link

tkejser 121 days ago

There are reasons for not USING.

First, you need to be aware of the implicit disambiguration. When you join with USING, you are introducing a hidden column that represents both sides. This is typically what you want - but it can bite you.

Consider this PostgreSQL example:

  CREATE TABLE foo (x INT);
  INSERT INTO foo VALUeS (1);

  CREATE TABLE bar (x FLOAT);
  INSERT INTO bar VALUES (1);

  SELECT pg_typeof(x) FROM foo JOIN bar USING (x);

The type of x is is double, - because x was implicitly upcast as we can see with EXPLAIN:

  Merge Join  (cost=338.29..931.54 rows=28815 width=4)
    Merge Cond: (bar.x = ((foo.x)::double precision))

Arguably, you should never be joining on keys of different types. It just bad design. But you don't always get that choice if someone else made the data model for you.

It also means that this actually works:

  CREATE TABLE foo (x INT);
  INSERT INTO foo VALUeS (1);

  CREATE TABLE bar (x INT);
  INSERT INTO bar VALUES (1);

  CREATE TABLE baz (x INT);
  INSERT INTO baz VALUES (1);

  SELECT \*
  FROM foo
  JOIN bar USING (x)
  JOIN baz USING (x);

Which might not be what you expected :-)

If you are both the data modeller and the query writer - I have not been able to come up with a reason for not USING.

link

n4r9 121 days ago

Thanks for the reply. The use case I have in mind is joining onto an INT primary key using a foreign key column of another table. This alone would remove a massive amount of boilerplate code.

link

tkejser 121 days ago

@da_chicken: You can read more about Snowflake ID in the Wiki page linked in the article.

The short story:

They are bit like UUID in that you can generate them across a system in a distributed way without coordination. Unlike UUID they are only 64-bit.

The first bits of the snowflake ID are structured in such a way that the values end up roughly sequentially ordered on disk. That makes them great for large tables where you need to locate specific values (such a those that store query information).

link

whynotmaybe 121 days ago

I don't think it's as easy to do the example in the article just by using information_schema.

> Which tables have a column with the name country where that column has more than two different values

But on their product page, the definition of floesql left me puzzled

> It uses intelligent caching and LLVM-based vectorized execution to deliver the query execution speed your business users expect.

> With its powerful query planner, FloeSQL executes queries with lots of joins and complicated SQL syntax without breaking your budget.

link

tkejser 121 days ago

INFORMATION_SCHEMA is a good start, but it does not get you to full metadata flexibility. The columns you need just aren't there. It is good to have a standard for the metadata - but the standard isn't ambitious enough (a point I also make in the blog and as you observe, the sample query isn't possible on Information Schema alone)

The Floe engine is a full database on top of Iceberg and Delta storage. The system views are just the tip of the iceberg. We will be blogging more about what we are building.

link

whynotmaybe 120 days ago

Good, execution planning for majors DBMS didn't receive any ground breaking evolution because it can be considered a "solved" problem but I'm always curious about new ways to address it.

link

deepsun 121 days ago

Differently though, AFAIR PostgreSQL does most of the schema changes transactionally, but not MySQL.

link

zabzonk 121 days ago

I'd be too terrified to change the schema directly via SQL on the metadata tables even if the engine allowed it, transactional or not.

link