Hacker News new | ask | show | jobs
by mrigger 2204 days ago
Many DBMS development teams use SQLsmith and other random query generators, which can effectively find crash bugs. In contrast, we designed and developed SQLancer to automatically find logic bugs, which are bugs that cause the DBMS to compute an incorrect result set (e.g., by omitting a record from the result set).
1 comments

How do you design the tests? Do you think of complex cases and hope any dbms may fail the test? Because this is would result in many tests having no problem and sometimes finding a bug, which cant really be an effective way of doing it.
We automatically generate databases and queries. The main challenge is to verify that the query yields the expected result. To this end, we have designed and implemented three different approaches:

* Pivoted Query Synthesis (PQS): The main idea is to generate a query that is guaranteed to fetch a randomly-selected row, to which we refer as the pivot row. If the pivot row is not contained in the result set, we have detected a bug.

* Non-Optimizing Reference Engine Construction (NoREC): The idea is to transform a query so that it cannot be optimized effectively by the DBMS. By comparing the result set of the "optimized" query with the result set of the "unoptimized" query, we can detect bugs that affect the DBMS' query optimizer.

* Ternary Logic Partitioning (TLP): The main idea of this approach is that we can partition a given query into multiple "partitioning" queries, each of which computes a partition of the result, and to then combine these partitions to derive the same result as the original query. If the two results differ, we have detected a bug in the DBMS.

According to the PQS paper, they pick an existing row in the database and generate random predicates until they find one that’s true for that row. They then use that as a WHERE clause and verify the original row is contained in the result set. A similar procedure is used to find relevant JOIN clauses.
You could use a fuzzer and fuzz several DBMS products at the same time. If their db's states or query results start to diverge, you probably found a bug in one of the product.