Hacker News new | ask | show | jobs
by WalterBright 641 days ago
> an SQL injection vulnerability

I simply do not understand why the SQL API even allows injection vulnerability. Adam Ruppe and Steven Schweighoffer have done excellent work in writing a shell API over it (in D) that makes such injections far more difficult to inadvertently write.

On airplanes, when a bad user interface leads to an accident, the user interface gets fixed. There's no reason to put up with this in programming languages, either.

5 comments

> why the SQL API even allows injection vulnerability

How would one implement this?

"SQL APIs" use prepared statements. Meaning you have a string for SQL and some dynamic variables that inject into that string via $1, $2 etc.

BUT now if developer makes that string dynamic via a variable, then you have SQL injection again.

> How would one implement this?

The low-level API could simply not allow SQL statements as strings, and instead provide separate functions to build the queries and statements.

It would provide entry points which could be used to ensure proper escaping and such, and would still allow for easily generating queries dynamically in the cases where that is needed.

Of course, it doesn't completely guard against Bobby Tables[1], one could imagine someone including a run-time code generator and feed it unprotected SQL as input.

But it should make it a lot more difficult, as it would be much more "unnatural", requiring going against the grain, to inject unprotected user data. Also, the "query_execute" function could raise an error if there's more than one statement, requiring one to use a different function for batch execution.

Pseudo-codish example off the top of my head, for the sake of illustration:

   is_active = str_to_bool(args['active']); // from user
   qry = new_query(ctx);
   users_alias = new_table_alias(qry, 't');
   query_select_column(users_alias, 'id');
   query_select_column(users_alias, 'username');
   query_from_table(users_alias, 'users');
   filter_active = query_column_eq_clause(users_alias, 'active', is_active);
   where = query_where(qry);
   query_where_append(where, filter_active);   
   cursor = query_execute(qry);
[1]: https://xkcd.com/327/
"Gee, this new programming language / API makes it hard to copy my SQL queries across. Better use something else."
If that's all what the datanase drivers supported...
Easy. Don’t write queries in a language (SQL) which interpolates content without escaping it for the enclosing structure.

Go one level up.

For example statements that are prepared should not allow strings in the SQL, but rather variables, and then bind them to values like PDO does

It would be a bit annoying to have to prepare outside and pass in every SQL literal you need to use in your query.

I'd rather have SQL API taking not strings but a special type that string can't be directly converted into without escaping (by default).

In C++ tagged literals could be used to create this special type easily. Similar constructs exist in some other languages

Literally a library can generate SQL statements and compile them

JS and PHP has tagged literals

But they have to be “escaped” properly before being interpolated!

That's the whole point of having a separate type for queries. Whenever you try to glue a string to a query the string gets escaped.
I agree. It would be nice if most SQL API's were secure by default to prevent SQLI. It's really something that the db connectors in the programming languages should handle with more grace like most ORMs today handle them pretty well.

I believe it largely is due to how SQL is designed to allow multiple queries to be concatenated with each other, and poor logic design when writing such queries.

SQL is not designed to allow multiple queries to be concatenated. That is a feature of certain databases, not SQL itself.
In virtually every dev environment, the overwhelming majority of queries are most straightforwardly written in a way that doesn't admit to SQLI. It's not really a programming language thing.
In my university one of the intro-to-CS courses spent some time on cybersecurity and SQL injections. It seemed like using prepared statements was less effort than concatenating queries together, so I asked why people would write vulnerable code anyway. The instructor wasn't sure; I'm not sure if she knew the uni taught SQL by concatenation in the prior semester.
Prepared statements are limited what you can do with them. A common stumbling block is sorting the results on a column that is user-specified.

If you look at the level of the discussion around this, it's not surprising SQL injections are still a thing.

https://stackoverflow.com/questions/12430208/using-a-prepare...

Even something as seemingly straightforward as selecting all entities whose ids belong to an array, a query that you'll find everywhere in most Graph QL apps, isn't easy to do without string concatenation.
It feels to me like there’s a level of abstraction missing when it comes to SQL and how it’s used.

Instead of just having :userId as a parameter that gets safely put in a query, it feels like there should be something like SORT_EXPRESSION(:orderBy) and for other common use cases, like in the sibling comment.

I have no idea whether this would fit in better as something handled by an ORM or the RDBMSes, but it probably doesn’t belong as the responsibility of the average developer, judging by the code I’ve seen.

I think the argument about needing to fix mechanisms that are commonly misused is a really good one, but there are no very clear solutions, I’m sure there can be found plenty wrong and overly trivialized with the suggestion above.

Curricula lags the industry by lots of years; in the early aughts concatenated SQL queries were the norm for database APIs, but prepared statements have been the default (or at least easily afforded in the default) for most APIs for a pretty long time now.
Yup.

In the mid aughts, one of my lecturers insisted that motion capture was limited to a few minutes because "several megabytes" was "too much" to store.

For some use cases, dynamically constructing the query is a requirement, for example if you’re building a data warehouse query interface , or have a user interface that allows selecting columns or similar.
Most programming languages have easy and well-known string concatenation and the simplest querying function typically takes just a string - it's easy to see why people naturally reach for string concatenation.
The vulnerability class is hardly unique to sql. any program that constructs content to be processed by another program or sub-routine, where an attacker can control the content has the potential to exhibit such a vulnerability. A good example is format strings in C or cgi-scripts that call each other or run OS commands.
> A good example is format strings in C

The D programming language allows direct use of C printf. However, D checks the arguments against the format specifiers in the format string to make it memory safe.

The constant stream of bugs due to format/arguments is now history.

There is no reason why C and C++ compilers cannot do this, too.

for static specifiers, I can see that. but for dynamically constructed format specifiers, especially where arrays to pointers/vargs are in use, is it possible to have a mitigation for that?

this pseudo-code as an example:

snprintf(fmt,userinputstring,args); printf(fmt,somearray);

Your suspicion is correct, the checks only work when the format string is a literal.
Like any LLM
> the SQL API

No such thing.

ISO 9075-3
Yeah, that's one of those "standards" that only ever existed on paper.