Hacker News new | ask | show | jobs
by ksri 3321 days ago
Author of jinjasql. The whole point of jinjasql is to dynamically build queries.

There comes a time when you need the power of sql, and an ORM gets in the way. Think unions, sql functions, group by, more complex sql. Think dynamically generated sql queries based on some data structure.

In such cases, you have to keep track of how many variables you are capturing in the query, and then manually bind them.

Jinjasql is just a query generator. At the end of the day, it doesn't actually execute the query. You take the query it generates, and the array of parameters it gives - and then execute it using traditional means using bind parameters.

This approach gives you the power of a template language to generate the query. You can create reusable macros, conditionals and other features that a template language provides.

1 comments

Thank you for your response! Very interested to understand if JinjaSQL could have made the following easier?

I just prototyped a very expressive reporting API that takes database driven configurations for the reporting columns (dimensions and metrics), validates inputs and allows very expressive filtering (including partial string matches, etc.) in Flask and SQLAlchemy.

Did not use the ORM part, but rather the query API: http://docs.sqlalchemy.org/en/latest/orm/query.html

Queries included a with statement, the main query had 4 nested subqueries (4 level cascade basically) for ranking, filtering and ordering.

The prototype supports queries like this: Give me best performing shows and their stats in my top 5 countries by the number of viewers I have in each of these top countries, but I want shows not broken down by country, but instead by season and episode, including counts of uniques at each level. (Sorry don't have a copy of the resulting SQL on this computer, but it's fairly long.)

Queries I designed up front, and reverse engineered into being dynamically generated by SQLAlchemy query API. However, by not having a fixed template, I can simply update my reporting table configurations and not have to worry about updating the templates or the query structure.

When the user requests something that does not have top dimension and topN results, I can just omit that part of the query and build valid syntax without it.

All-in-all, the entire API is less than 1k lines of python.

Jinjasql is meant for reporting use cases, so yes - it should be able to help you. However, when we made JinjaSQL, we didn't want to build a super-expressive reporting API. Instead, for every report/chart we wanted to make, we wrote a comparatively simpler query and used jinjasql to get it working.

If you have a single complex query that is generating all your reports based on database configuration - I'm guessing you need the full power of python. So while JinjaSQL could have save some effort, I'm not sure it'd save you a lot.

Hope that helps!

Thanks!

The reports are all API driven. User specifies dimensions and metrics, as well as filters, etc. The API returns the JSON response.

I can't predict what users will want to visualize in the future, so dedicated templates for each of the visualizations are not practical for my use case.

The goal is to minimize dev work on the API and let front end config dictate what columns the API returns.

Let's assume you have a python object "userquery" that has the user specifications - including dimensions, metrics, filters and the cube to query. I'm also assuming you have whitelisted all the parameters in this object per your database model, otherwise you are open to security issues.

Here's how you could write that in jinjasql. I'm writing this out of memory, and it has errors/issues - but I hope you get the idea.

    SELECT
    {% for metric in userquery.metrics %}
        {{ metric.id }} as '{{metric.description}}'
    {% endfor %}
    FROM {{ userquery.cube }}
    WHERE 
    {% for filter in userquery.filters %}
        {{ filter.key }} {{ filter.operator }} {{ filter.value }}
    {% endfor %}
    GROUP BY 
    {% for dimension in userquery.dimensions %}
        {{ dimension }}
    {% endfor %}
This way, your template doesn't have to map to exactly one visualization.

Now, another interesting this in the security aspect. You can also pass the logged in user data model, and then append a where clause that restricts the rows based on the logged in user id or his role or whatever else is your application security model.

I'm not sure how much effort it would have saved you, but it does help us a lot - mostly because we don't have to translate between SQL and the way SQL ALchemy / ORM works.

Thanks for the example, this helps a lot in my understanding of this!

It's pretty similar to what I did, but the template would be a lot more complex due to various column definitions and calculations, for example uniques require the base query to have the IDs of whatever I'm counting, but all in all I see the use case.

Is that safe from sqli prospective? It would inject table name and all, which I believe you can't bind...
You're right. That's why I said upfront - whitelist values in the userquery object.