Hacker News new | ask | show | jobs
by JackFr 2063 days ago
The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block. Which makes an incorrect implementation worse but possibly correct.

3 comments

If the resource is shared and being accessed from many threads and is both written to and read from then it is the correct behavior to to lock it with the proper type of lock at access time. Depending on resource it might be possible to split it into few with more granular access.

As for novices: they are called that for reason and supposed to be under supervision rather than allowed running wild.

Why is this being downvoted? It's the truth.

HN needs to only allow downvotes that have an accompanying explanation comment.

HN uses downvotes mostly to boo the people with opinions deviating from common party line. As for reasonable explanation - you're asking too much. Programming as many other things often are treated as the religion. No arguments, it just is.
As with most other compromised social sites, no badthink allowed here and how dare you.
Novices don't build working concurrent systems of any kind with any toolkit, period. Concurrency is hard and thinking all the "concurrency problems" go away with some message passing is both ludicrous and dangerous. Fearless concurrency can only be attained through understanding, not by thinking all your problems went away because you're using a "cool approach".
Surprisingly this is what the akka framework promises : Message passing and immutability of objects.
Software usually has state (unless that state is completely kept and managed externally in a database for example). And the state mutates. Simple case example is a big array that has to be processed in place.
It's pretty easy to make the leap from individual SQL statements to SQL statements which are wrapped in a transaction.
Excellent example for making my point, since "just wrap it in a transaction" usually leads to concurrency bugs like the beloved lost update.
If you're talking database like transaction it "usually" leads to concurrency bugs only if the transaction level is not strictly serializable. It does not hurt to know things before labeling them.
This is not something I'm familiar with. What's the beloved lost update and what transactions are you using that suffer from it?
Transactions give varying degrees of "isolation" between them, depending on the database (and its version + configuration). For example, in what SQL would call READ COMMITTED, where transactions will only read data that has been committed, read-modify-write updates are generally bugs. The classic example:

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT
    - Result: balance is 50, but should be 0
With serializabile transactions (not all databases have this, particularly if you look beyond SQL):

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT -> Fails, needs to retry
    - transaction 2b: SELECT balance FROM account: // = 50
    - transaction 2b: UPDATE account SET balance = 0
    - transaction 2b: COMMIT -> Ok!
    - Result: balance is 0
Because this is needed so frequently, databases have calculated updates, basically atomic operations:

    - transaction 1: UPDATE account SET balance = balance - 50; // values indeterminate
    - transaction 2: UPDATE account SET balance = balance - 50; // values indeterminate
    - transactions 1,2: COMMIT
    - Result: balance is 0
Or, one could lock the rows, like so:

    - transaction 1: SELECT FOR UPDATE balance FROM account; // = 100
    - transaction 2: SELECT FOR UPDATE balance FROM account: // = transaction 2 is stalled until transaction 1 commits or rollbacks
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    // transaction 2 can now continue and gets balance = 50
    - transaction 2: UPDATE account SET balance = 00
    - transaction 2: COMMIT
    - Result: balance is 0
And this is just one simple example of the problems you can have concurrently accessing one table, even while using transactions. Not to speak of the issues you can run into when interacting with systems outside a single database, which don't interact with the transaction semantics of the DB.

Concurrency is just very non-trivial regardless the abstraction.

Well, I guess I'll just keep digging myself further into a hole!

I want to focus entirely on your first example.

Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

If you're telling it like it is, then I cannot argue with facts. I guess I'll stop using DBs, at least until they figure this stuff out in 1973.

What's a better alternative to synchronizing access to shared resources?
Treat it like GC and don't leave it up to the programmer.
An make a programmer unable to achieve highest performance when needed. We leave in supposedly free world. If you want to be "protected" be my guest and use languages with GC. Plenty of those. For somebody who need the opposite and uses "unprotected" tools - leave them alone. You have no rights to decide how other people do their work unless they're under your direct control.
Does that involve using concurrency primatives that basically don't allow access to any shared mutable state?
I'd say provide concurrency primitives to disallow direct access to shared mutable state. You still do the reads and the writes, but you let the system take and release locks for you.

Let's say you wanted to turn a list into a bounded list of 4 elements.

Race-condition insert:

    if (sz < 4) {
      list.insert(x);
      sz++;
    }
Safe insert:

    atomically {
      if (sz < 4) {
        list.insert(x);
        sz++;
      }
    }
So atomically organises the locking/unlocking/rollback for you such that a fifth element will not be inserted.
Aren't the semantics of this exactly the same as Java's synchronise? What error does it protect you against compared to synchronising on list? What happens if .insert() also uses atomically{} somehow?
Glad you asked!

synchronized locks code, atomically locks data.

It maps to my thinking better, because what I'm interested in is manipulating two or more resources at the same time. I'm not interested in making sure two threads aren't in the same region of code at the same time.

> What error does it protect you against compared to synchronising on list? Synchronized is perfectly fine for my example. I should have picked a better example of two resources (instead of list and size of list), because synchronized gets trickier once you start combining different pieces.

For example, I have two bounded lists, and I want to transfer an element from one to the other. I've already written synchronized versions of remove and insert, so let's try with them:

    transfer(otherlist) {
        x = otherlist.synchronizedRemove();
        this.synchronizedInsert(x);
    }
There will be a moment in time where the outside world can't see x in either list. Maybe I crash and x is gone for good. Or maybe the destination list becomes full and I'm left holding the x with nowhere to put it. So what is to be done? I could synchronize transfer but that still wouldn't fix the vanishing x, or the destination filling up. So I paid the performance cost of taking two/three locks and I've still ended up buggy.

I think the fix here is to lock each list, then no-one else can access them and it should fix the vanishing x:

    transfer(otherlist) {
        synchronized(this) {
            synchronized(otherlist) {
                x = otherlist.synchronizedRemove();
                this.synchronizedInsert(x);
            }
        }
    }
I think that's correct? But now I have taken too many locks - I only needed remove and insert not synchronizedRemove and synchronizedInsert. And now I've introduced deadlock possibility - if two transfers are attempted in opposite directions.

I can fix the too many locks problem, by exposing non-synchronized remove and insert and have transfer call them instead. But then callers and I will accidentally call the wrong one. I break any pretence of encapsulation by exposing unsafe and safe versions of a method. The deadlock is harder to fix. I'd need to synchronize the lists in some agreed order (and have everyone else obey that ordering in their other methods too!).

Instead, I want my implementation to look something like:

    BoundedList {

        transactional Int sz;
        transactional List list;

        transactional insert(x) {
            if (sz < 4) {
            list.insert(x);
            sz++;
            }
        }

        transactional remove() {...}

        transactional transfer(otherlist) {
            x = otherlist.remove();
            this.insert(x);
        }
    }
> What happens if .insert() also uses atomically{} somehow?

A good implementation would throw a compile-time error. A bad implementation could throw a runtime error.

In order to do this, transactional actions would need to be marked as such - to prevent mixing them up. atomically by definition is a non-transactional action (because it's the thing that commits all the steps to the outside world) so if you find an atomically inside a transaction, it's a type error.

You've already used a system like this if you've worked with any decent SQL implementation:

    BEGIN TRANSACTION

    UPDATE accounts
    SET money = money - 50
    WHERE accountId = 'src'

    UPDATE accounts
    SET money = money + 50
    WHERE accountId = 'dest'

    COMMIT TRANSACTION
I didn't take any 'locks'. I just wrapped two perfectly good individual actions and said 'run them both or not at all'. Though to be fair I am getting a lot of grief in another thread by suggesting that even novices could wrap up their SQL like that without getting it wrong.