|
Thank you for the feedback!
The service will be put into commercial operation as saas in the coming months.
I do not plan to open the source code, unfortunately. The idea is that the tool will only work with the sources of SQL queries, and I had to work hard to implement it. The work consists of several steps
1) get the AST (Abstract Syntax Tree) database schema (DDL).
At this stage, only Postgresql up to version 10 is supported. Soon I will deal with the parser from Postgresql 13. This is not a trivial task. I have to build everything from the source code of Postgresql :) 2) Building a database model.
We parse all DDL-commands one by one and apply changes. For example, apply all ALTER TABLE to the table described above, add user-defined functions to the list of built-in functions, and so on.
It is necessary to have a complete overview of all types of tables, indexes, and each table described in a DDL script. 3) get an AST query (DML) and build a model of the result.
This is the most complex and interesting part :)
The task is to get a list of field names and their types, which will be returned after the query execution.
You need to consider CTE and the list of tables specified in FROM. You need to understand what function will be called and what result will be output. For example, function ROUND is described in three variants, from different arguments and with varying types of result, function ABS - in six variants. I.e., it is required to understand the types of arguments before selecting a suitable one :) In the process, implicit type casting is considered if necessary. The same is valid for operators. An operator in Postgresql is a separate entity that you can create yourself. Postgresql 11, for example, describes 788 operators. Various types of syntax are taken into account, for example - SELECT , t1., id, public.t2.name, t3.* FROM t1, t2, myschema.t3 - will be parsed correctly. But even this is not the most challenging thing :) The most exciting thing is to be able to understand two things:
A. whether each of the fields can be NULL or not. It depends on many factors, such as - how the JOIN of the table with the source column is made, whether there is a FOREIGN KEY, what type of JOIN is used, what conditions are described in the ON section, what is written in the WHERE conditions.
B. How many records will return the query described in the NONE, ONE, ONE OR NONE, MANY, MANY OR NONE categories. Again, this is affected by the conditions described in JOIN and WHERE, whether there are aggregation functions, whether there is GROUP BY, whether there are functions that return multiple records. This function, by the way, is also used in the first step - to get types for VIEW. This was a brief description of the first part of the service ;).
It can already be a service in itself. It is possible to generate types and all code of microservices, including JSON-schema and tests, based on DLL and DML set.
But as I wrote above, most people prefer to use an ORM such as Django or RoR. :(
For this reason, I've removed this functionality from a playground and will take it to a separate project when I get my hands on it. It will also include various tools, such as - information about all possible exceptions that may be thrown by request, automatic creation of migrations to CI if your DDL files are in the repository, whether there are unused indexes or fields and many other exciting things :) And the second part of the service that I plan to promote in the first place is a tool to search for bugs, architectural, and performance problems automatically. The target audience here is DBA, who have to deal with forgeries RoR-developers and their colleagues :)
This part is entirely based on all information obtained in the previous stages.
Part of the errors can be understood by AST (linter principle), but the most interesting rules are based on knowledge about types, understanding of NULL/ IS NOT NULL, and the number of returned records. There are more than a thousand such rules, but I suppose there will be about 5000 of them in the next couple of years :)
Also described are about 200 rules that can lead to runtime errors, but they are not needed by DBA, because their job is to find problems in valid queries :) There will be more than 1000 such rules as well since Postgresql describes more than 1700 runtime errors. And yes, this is all about Postgresql.
After I start commercially using Postgresql, I plan to do the same for Mysql and then perhaps for Clickhouse if there are no special offers of cooperation :) |
But my application was to be able to use raw SQL queries instead of an ORM in an application. By statically analyzing the SQL queries, then you can, in a statically typed language, automatically type-check the parameters to and results of the queries. Combining that and an IDE like the Jetbrains ones, which provide intelligent SQL autocompletion/refactoring, then tbh., I think it would beat an ORM in terms of ergonomics. Some people like LINQ or various Haskell/Scala ORMs because they are type-safe. This would be totally natural to those who know SQL without the downsides of being totally unanalyzed and type-checked.
It would be like the clojure library called hugs, but with static type-checking.