Hacker News new | ask | show | jobs
by mattashii 1291 days ago
HTTP does not (easily) allow for using answers of earlier queries in the transaction. E.g.

    BEGIN;
    INSERT INTO my_table (...) RETURNING (id);
    SELECT count(\*) AS my_count FROM my_table;
    INSERT INTO historical (new_id, value_derived_from_count, now());
    COMMIT;
is difficult (or potentially impossible) to do transactionally using single HTTP queries. Sure, you can rewrite your queries to use single-statement queries if you're lucky, but for others that may take a lot of work. Keeping a transaction alive over WebSocket allows you to easily have transaction states that last longer than the lifetime of the first request, which allows for transactions of which state is processed in more than one place.
2 comments

That I understand.

But of course for the user of the driver it might be fine if that is 2 or more HTTP requests. I expect that is how PlanetScale does it in their transaction implementation, https://github.com/planetscale/database-js#transactions, and I know that is how Prisma Data Proxy handles it - the transaction is identified with an ID which is returned to the Client and then included in further requests for the same transaction.

It's valid tradeoff to make to prefer a persistent connection to keep the overhead for multiple queries in a transaction as low as possible - which seems what Neon has done here.

> the transaction is identified with an ID which is returned to the Client and then included in further requests for the same transaction.

Yes, and there's a catch there that people might not notice: HTTP-based queries utilize this ID to identify which active transaction to use, but this is vulnerable to concurrent HTTP requests on the same transaction ID, thus allowing query injection (early COMMIT, SELECT sleep(1000000), etc. by attackers) due to requests being re-routed every time you send the query.

Keeping a direct connection (albeit proxied) to the PostgreSQL instance prevents this kind of attack, while also allowing for better state keeping control in systems that don't have access to raw TCP sockets, and reducing per-query overheads.

In our case, it's a bit more state than just a transaction ID, but the same concept applies.

And yeah, to mirror the other comment, this isn't particularly something I'd consider a security issue. If an attacker has access to your transaction (session) state, this is a similar risk vector on any web service that utilizes browser cookies or sessions. If someone hijacks your browser cookies, they can do anything they want into your account typically, ignoring 2FA and extra layers.

Same risk here. If someone were to intercept your request, which is strictly over HTTPS, and hijack your session, or you erroneously were logging the session and authentication credentials publicly, then you're exposing yourself to a lot bigger risk than someone adding a sleep() into your session. They could just dump everything from the database, because they are you.

A direct connection only changes this because it's a stateful protocol, and if the connection is severed, everything is lost/rolled back and nobody can intercept that.

But anyways, I'd hardly call a well established security model of HTTP sessions flawed in this regard.

If anything, it's possible for a client to do some dumb things, like, parallel queries within a single transaction, which you cannot do with a direct connection, which ultimately is just... going to be undefined behavior since they fundamentally get serialized into the database, just in an undeterministic order relative to the client.

I am not sure I am following.

We can have either parallel queries in a transaction where the order does not matter, or we can have them serialized - but then the next query will only be sent onto its journey after the previous query returned its data to the app.

What are the attack vectors here? And where would the attacker sit and try to achieve what exactly?

It’s not a trade off. They’ve not been able to do something that is doable and are trying to tell you it is a trade off.
Just turn it into a single query using CTEs.