| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by formerly_proven 2063 days ago
	Novices don't build working concurrent systems of any kind with any toolkit, period. Concurrency is hard and thinking all the "concurrency problems" go away with some message passing is both ludicrous and dangerous. Fearless concurrency can only be attained through understanding, not by thinking all your problems went away because you're using a "cool approach".

2 comments

abhishekjha 2063 days ago

Surprisingly this is what the akka framework promises : Message passing and immutability of objects.

link

FpUser 2063 days ago

Software usually has state (unless that state is completely kept and managed externally in a database for example). And the state mutates. Simple case example is a big array that has to be processed in place.

link

mrkeen 2063 days ago

It's pretty easy to make the leap from individual SQL statements to SQL statements which are wrapped in a transaction.

link

formerly_proven 2063 days ago

Excellent example for making my point, since "just wrap it in a transaction" usually leads to concurrency bugs like the beloved lost update.

link

FpUser 2063 days ago

If you're talking database like transaction it "usually" leads to concurrency bugs only if the transaction level is not strictly serializable. It does not hurt to know things before labeling them.

link

mrkeen 2063 days ago

This is not something I'm familiar with. What's the beloved lost update and what transactions are you using that suffer from it?

link

formerly_proven 2063 days ago

Transactions give varying degrees of "isolation" between them, depending on the database (and its version + configuration). For example, in what SQL would call READ COMMITTED, where transactions will only read data that has been committed, read-modify-write updates are generally bugs. The classic example:

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT
    - Result: balance is 50, but should be 0

With serializabile transactions (not all databases have this, particularly if you look beyond SQL):

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT -> Fails, needs to retry
    - transaction 2b: SELECT balance FROM account: // = 50
    - transaction 2b: UPDATE account SET balance = 0
    - transaction 2b: COMMIT -> Ok!
    - Result: balance is 0

Because this is needed so frequently, databases have calculated updates, basically atomic operations:

    - transaction 1: UPDATE account SET balance = balance - 50; // values indeterminate
    - transaction 2: UPDATE account SET balance = balance - 50; // values indeterminate
    - transactions 1,2: COMMIT
    - Result: balance is 0

Or, one could lock the rows, like so:

    - transaction 1: SELECT FOR UPDATE balance FROM account; // = 100
    - transaction 2: SELECT FOR UPDATE balance FROM account: // = transaction 2 is stalled until transaction 1 commits or rollbacks
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    // transaction 2 can now continue and gets balance = 50
    - transaction 2: UPDATE account SET balance = 00
    - transaction 2: COMMIT
    - Result: balance is 0

And this is just one simple example of the problems you can have concurrently accessing one table, even while using transactions. Not to speak of the issues you can run into when interacting with systems outside a single database, which don't interact with the transaction semantics of the DB.

Concurrency is just very non-trivial regardless the abstraction.

link

mrkeen 2063 days ago

Well, I guess I'll just keep digging myself further into a hole!

I want to focus entirely on your first example.

Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

If you're telling it like it is, then I cannot argue with facts. I guess I'll stop using DBs, at least until they figure this stuff out in 1973.

link

formerly_proven 2062 days ago

> Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

We have two concurrent tasks both doing exactly the same thing in order to deduct 50 money:

    BEGIN TRANSACTION;
    SELECT balance FROM account; // = 100
    UPDATE account SET balance = 50; // calculated by application as 100-50
    COMMIT;

Perhaps I misunderstand you, or you misunderstood the way I presented the example (possibly because I presented it poorly). But in my mind there is hardly a way to describe this code as "not transactional".

I merely showed one possible way how these concurrent tasks may execute in practice leading to bugs. Of course, for casual testing this will actually look and work correctly. As one commenter far up the thread said (as an attempt to refute understanding of concurrency as necessary)

> The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

> And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block. Which makes an incorrect implementation worse but possibly correct.

Then,

> Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

That is only one of the possible ways for transactions to work. Note that IIRC the only database that interprets the SQL standard like this is postgres, while MySQL and Oracle still have (more subtle) serialization issues even on the SERIALIZABLE isolation level (example: https://stackoverflow.com/a/49425872).

Note that you can end up with deadlocks and transaction failures on any level stricter than READ COMMITTED, so the application needs to be able to deal with both of these.

link