Hacker News new | ask | show | jobs
by davps 1230 days ago
This approach didn't use an ORM and run the tests concurrently against the same database.

I follow those steps on my pipeline:

Every time I commit changes the CI/CD pipeline follow those steps, on this order:

- I use sqitch for the database migration (my DB is postgresql).

- Run the migration script `sqitch deploy`. It runs only the items that hasn't been migrated yet.

- Run the `revert all` feature of sqitch to check if the revert action works well too.

- I run `sqitch deploy` again to test if the migration works well from scratch.

- After the schema migration has been applied, I run integration tests with Typescript and a test runner, which includes a mix of application tests and database tests too.

- If everything goes well, then it runs the migration script to the staging environment, and eventually it runs on the production database after a series of other steps on the pipeline.

I test my database queries from Typescript in this way:

-in practice I'm not strict on separating the tests from the database queries and the application code, instead, I test the layers as they are being developed, starting from simple inserts on the database, where I test my application CRUD functions that is being developed, plus to the fixtures generators (the code that generate synthetic data for my tests) and the deletion and test cleanup capabilities.

-having those boilerplate code, then I start testing the complex queries, and if a query is large enough (and assuming there are no performance penalties using CTE for those cases), I write my largue queries on small chunks on a cte, like this (replace SELECT 1 by your queries):

    export const sql_start = `
    WITH dummy_start AS (
        SELECT 1
    )

    export const step_2 = `${sql_start},
    step_2 AS (
        SELECT 1
    )
    `;

    export const step_3 = `${step_2},
    step_3 AS (
        SELECT 1
    )
    `;

    export const final_sql_query_to_use_in_app = ` ${step_3},
    final_sql_query_to_use_in_app AS(
        SELECT 1
    )

    SELECT \* FROM final_sql_query_to_use_in_app
`;

Then on my tests I can quickly pick any step of the CTE to test it

    import {step_2, step_3, final_sql_query_to_use_in_app} from './my-query';

    test('my test', async t => {

        //
        // here goes the code that load the fixtures (testing data) to the database
        //

        //this is one test, repeat for each step of your sql query
        const sql = `${step_3} 
            SELECT * FROM step_3 WHERE .....
        `;
        const {rows: myResult} = await db.query(sql, [myParam]);
        t.is(myResult.length, 3);

        //
        // here goes the code that cleanup the testing data created for this test
        // 
    });


and on my application, I just use the final query:

        import {final_sql_query_to_use_in_app} from './my-query';

        db.query(final_sql_query_to_use_in_app)

The tests start with an empty database (sqitch deploy just ran on it), then each test creates its own data fixtures (this is the more time consuming part of the test process) with UUIDs as synthetic data so I don't have conflicts between each test data, which makes it possible to run the tests concurrenlty, which is important to detect bugs on the queries too. Also, I include a cleanup process after each tests so after finishing the tests the database is empty of data again.

For sql queries that are critical pieces, I was be able to develop thounsands of automated tests with this approach and in addition to combinatorial approaches. In cases where a column of a view are basically a operation of states, if you write the logic in sql directly, you can test the combination of states from a spreadsheet (each colum is an state), and combining the states you can fill the expectations directly on the spreadsheet and give it to the test suites to run the scenarios and expectations by consuming the csv version of your spreadsheets.

If you are interested on more details just ping me, I'll be happy to share more about my approach.

2 comments

This is a beautiful and powerful approach (and great explanation), IMO. Personally, I'd keep all CTEs (your steps) as units and combine them separately. But you've commented on that in your other comment.
how do you know that what your select statements return is correct? on a real database?
Yes, it always runs on a real database, without mocks, as integration tests where you can test each CTE's auxiliary statement separately, which acts as a step of our sql pipeline. So the initial data is inserted on the tables and then I can exercise the assertions on the sql queries or views against the real database. In theory it could be easy to rearrange the concatenations of the CTE strings from above so it can be tested as a unit when it's put together with the previous auxiliary step as a temporary table but I never had the need for that since the integrations are simpler and works really well for me. The essential part of my approach is to treat the sql code as concatenated pieces of strings, and call slices of that with just the right concatenation to exercise the test to that slice on the real database, which is valid since the query will always be valid.

There is another pattern too, when I implemented a RLS based multi-tenancy with RBAC support, which needed an relatively large sql codebase and needed to be battle tested because it was critical, I've splited a big part of the sql code in a lot of sql functions instead of views to test the code units or integrations (using something similar to dependency injection but for the data, to switch the tenant RBAC's contexts), because for the sql functions I can pass different Postgresql's Configuration Parameters to test different tenants for example.