| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oaiey 1368 days ago
	After seeing one diagram: Why the hack are we still talking to databases with SQL strings and not directly specifying the Query-AST? Admins, sure (a fancy UI could help there as well) but why in our code?

13 comments

samwillis 1368 days ago

I believe most sql engines cache the query plans for parameterized queries, which would cover the majority of requests.

Caching the query plan is also going to go further in performance optimisations than just “precompiling” the SQL to a AST.

link

speed_spread 1368 days ago

Because query parsing time is totally insignificant compared to query IO?

I mean, I get it but the chances that it makes a noticeable difference are zero in almost every case. Also you'd have to change a lot of the existing tooling, at which point you might as well send a compiled agent or use stored procs?

link

robertlagrant 1368 days ago

> Because query parsing time is totally insignificant compared to query IO?

The problem there is that the SQL query string is not parsed at compile time of the host program, so things that could be caught at compile-time are not, and things like appending strings to SQL strings in an unsafe way are much too easy to do.

link

HelloNurse 1368 days ago

That's why there are query parameters (see https://www.sqlite.org/lang_expr.html#varparam for the comprehensive SQLite implementation) and automatic escaping. Not to mention tests and code reviews.

link

robertlagrant 1368 days ago

Yes, those are other ways to try and solve this problem.

link

jsmith45 1368 days ago

The irony is that the SQL Standard actually specifies two different generally viewed as obsolete methods for handing this.

One is "SQL/CLI", which envisions that you provide a "module" of queries (parameterized and static), which you then compile with some database compiled tool, resulting in an object file you can link into your codebase that exports function calls for these queries.

The other is "Embedded syntax", which is basically "embed RAW SQL statements into your program, run a database provided pre-processor to convert it to normal ADA, C, COBOL, FORTRAN, MUMPS, Pascal, or PL/I code, which is then compiled normally. In theory, this is supposed to generate a "client module" like SQL/CLI, and insert the needed code to call into that module in place of the SQL.

For both of these the queries are considered to be "prepared" at compilation time, but this mostly amounts to syntax checking. It could theoretically also provide warnings about possible execution errors based on the current state of the database (for example, if you are trying to select some column that does not currently exist in some given table), but it could not treat these as errors since DML statements run by this or other programs may change the table before the query is actually executed.

The sql standard calls every form of passing a string version of a query to a database as "dynamic sql" (Not just say SQL constructed in say a store procedure and called with EXECUTE). This is because the database cannot statically analyze those queries at program compilation time, so you may need to link with additional libraries that include the query parser, etc. And supporting "dynamic sql" at all is an optional feature.

link

okennedy 1368 days ago

We are... Spark's DataFrame is essentially a relational algebra AST-builder. Microsoft's LINQ interprets SQL directly in at compile-time. All of these, however, run queries more or less directly in the system in which they're specified.

It helps to think of SQL strings as an untrusted wire format. Yes, parsing is a pain, but it comes with two main benefits: (i) The wire format is human writable/interpretable, with all the accompanying benefits, and (ii) The wire format is easily extensible in a predictable way.

That latter one is particularly useful in keeping SQL's ecosystem open. Take a front-end library like SQLAlchemy or ScalikeJDBC for example. It's not practical for any one such library to support every extension provided by every database engine. SQL provides a fall-back for when you need a back-end feature that hasn't been implemented in any given front-end.

link

oaiey 1368 days ago

C# LINQ does pass an expression tree into the abstraction before it will then serialize it SQL and then the database deserialize it into an AST. LINQ-to-Objects is in memory and works on the AST directly.

Also both LINQ language syntax and library methods are a builder paradigm for the expression tree. Valid, but still far from ideal representation of an AST.

link

bayindirh 1368 days ago

> Why the hack are we still talking to databases with SQL strings and not directly specifying the Query-AST?

The same reason language servers took off. Instead of one to one mapping, SQL enables one to many mapping with minor tweaks, allowing everyone to do whatever they want over a well known, well defined, mature abstraction.

In the same spirit, I may ask why we're not writing assembly or even machine code, and we have programming languages? Testers, sure, abstraction means clarity up to an extent, but why the developers themselves still use programming languages?

link

keybored 1368 days ago

You’re missing the point if you think that that is “in the same spirit”.

SQL-as-strings and SQL-as-AST are still the same thing. What is being proposed it not to write procedural code for record retrieval instead of declarative SQL.

link

Timpy 1368 days ago

Isn't that what a parameterized query does? It separates the sql logic from the inputs so it can cache the query and then it accepts inputs separately. Safer and more optimal at the same time, the engine doesn't have to re-optimize the same query again for the life of the connection. If my understanding is wrong somebody please correct me, it's kind of hard to get good information on what's going on under the hood with these things.

link

manigandham 1368 days ago

A SQL query is an AST, but represented in a compact portable form. It also supports functions, procedures, and parameterization for flexible and safe query construction.

Your code would get incredibly large and complicated if you had to specify any serious SQL query as a raw AST.

link

euroderf 1368 days ago

Check out prql[1], it might be a conceptual model you'd like.

[1] https://prql-lang.org/

link

oaiey 1368 days ago

That is just modern SQL with horrible LINQ memories ;)

link

leprechaun1066 1368 days ago

Most queries to kdb+ do exactly what you are asking for.

link

AndrewDucker 1368 days ago

Can you give an example of what you mean, and what we'd gain from it?

link

dkjaudyeqooe 1368 days ago

When you're accessing a SQLite database in code you have to generate a query string. Parameters ameliorate that somewhat, but in many cases you still have to regenerate a new string for each new query. It's inefficient to translate your query into a string only to have it parsed back into something structured by SQLite.

link

AndrewDucker 1368 days ago

Are you suggesting an alternate syntax for SQL-like queries, which is more compact?

Or a specific one for SQLite?

I'd be very surprised if generating the SQL query string and parsing it again was more than a trivial percentage of the query execution time, but happy to be proven wrong.

link

dkjaudyeqooe 1368 days ago

Probably, but it's still inefficient in other ways. There's memory issues and the code overhead of dealing with strings. It'd be much nicer to have a binary interface, but admittedly it would be much more complex.

The real solution is to include an alternative to SQL that looks and works like Datalog including things like variables. That would make SQLite 100x more productive for programmers. But that will never happen.

link

AndrewDucker 1368 days ago

Wouldn't that then just be a prepared statement with variables?

link

rprospero 1368 days ago

As I said in another part of the thread, I was in a scenario where we were performed millions of inserts into a table of four integers, one row at a time. Generating the string and parsing it again wound up being enough to blow our 10µs time budget.

link

dkjaudyeqooe 1368 days ago

But with parameters you don't need to do that, or in any other scenario where only (literal) values change.

On the other hand, if you were importing data and each line specified which table and column it had to go into you'd probably have to write a new SQL statement each time.

link

aaaaaaaaata 1368 days ago

What was the time constraint due to, out of curiosity?

What did you do for a solution?

link

rprospero 1368 days ago

It was a neutron detector that stored the the position, energy, and time stamp into the database. The design spec was 10,000 neutrons per second, so that was the origin of the time limit.

Of course, half those neutrons arrived within a 20ms window, so we had a buffer to handle the load. However, however, if the average remained above the limit, the buffer would fill up. There was a brief discussion of just ignoring events when the buffer was full, but that could introduce systematic errors in the data that would be impossible to detect, so it was better for the database to just crash.

The solution was to tighten the slits on the neutron beam to lower the count rate to the point that we never filled the buffer. Granted, we were testing a high flux technique that, so that was a bit of a disappointment. Everything else in the instrument could handle an order of magnitude more neutrons, except this database.

By the way, to be fair to the database designers, they were working with Clinton era tech and were probably told that the flux rate would never be a SUSTAINED 10,000 events per second.

link

chasil 1368 days ago

SQLite was released in the year 2000; it is 22 years old.

Because of its age, it does carry deprecated API components that are maintained solely for backward compatibility.

Notice how many _v2 and _v3 variants are present, denoting reworked aspects of the API:

https://sqlite.org/c3ref/funclist.html

A product of this age was designed (and redesigned) for specific needs. Unfortunately, your use case is not among them.

link

ThemalSpan 1368 days ago

Is that a fact or an intuition?

link

rprospero 1368 days ago

Not the OP, but I can give two gains. First off, by passing an AST, instead of just an SQL string, we cut out a huge number of possibly SQL injection attacks. Second, in most of the projects where I've used of SQL, there's been some kind of database object that builds the actual query, which it then converts to a string. The database then takes that string and parses it into an AST. There's some performance gains to be made by skipping the middle man and just creating the AST directly.

Many years ago, I was on a project that needed to add rows to a database with a hard 10µs limit. Each rows was just four integers, so the writing part was trivial. However, allocating the string, formatting the integers to strings, then parsing the resulting string often put us over the time limit. Every time the time limit was breached, we lost about five grand. Why we were using an SQL database for this at all is a story for a different time.

link

Taywee 1368 days ago

You should be using prepared statements and variable bindings. You should never have to parse any statement more than once (many dynamic language interfaces even cache statements for you, and many SQLite libraries also offer an interface for this out of the box). You should also never be formatting integers into strings, but simply binding them to the prepared statement. I really hope you weren't interpolating them and formatting each statement with the literal values in the string. That would be horridly inefficient, and a bad misuse of any modern SQL database. I also hope you were properly using transactions to speed up your batches.

See: https://www.sqlite.org/c3ref/stmt.html

link

danielheath 1368 days ago

Having application build a string and pass it to a library which parses the string into an AST cannot be as efficient as just building the AST, right?

link

yread 1368 days ago

It's a lot more efficient for the human who is used to reading SQL to read the SQL (or even SQL-producing code) than AST

link

pif 1368 days ago

Type safety at compile time to begin with.

link

emfax 1368 days ago

I’ve been trying to do exactly this. It must be possible.

link

liveoneggs 1368 days ago

PREPARE?

link

323 1368 days ago

Didn't Microsoft LINQ try something like that? Was not particularly successful.

link

manigandham 1368 days ago

LINQ stands for Language-Integrated Query and is incredibly successful at its purpose of providing powerful querying functionality and extensions baked into the C#/.NET language space itself.

This querying framework is what powers translations and compilation into SQL and several other languages (depending on the datastore provider used).

EntityFramework is one of the most advanced ORMs out there and is supremely productive because of LINQ.

link

Semaphor 1368 days ago

LINQ is stunningly successful, LINQ Query syntax (what you probably are referring to) with LINQ to SQL less so.

And yes, as usual, we have the amazing confusion of Microsoft Naming.

But query syntax is essentially just a way to use an ORM that looks closer to SQL but is strongly typed.

link

oaiey 1368 days ago

LINQ creates an AST in the .NET land, however, before passing it to an actual SQL database, it serializes the expression tree to SQL.

So no, it does not talk AST to the database.

link

Ducki 1368 days ago

No, it never tried that.

link

afavour 1368 days ago

Why not?

link

izacus 1368 days ago

Same reason you don't read files by concating string commands into another different language and then posting them to the OS.

link