|
|
|
|
|
by spotman
3831 days ago
|
|
It helps but doesn't quite answer my question yet. With your situation #1, what if this is very common transaction and therefore you have 100 of these all waiting. What about 1000, 5000 etc. what system resources are used to let these transactions wait indefinitely ( if I understand your semantics with specific regard to blocking )? Some systems handle this as a failure that is communicated to the client rapidly. Other systems let N clients actually wait indefinitely but at the cost of taking up a thread / file descriptor, etc. in systems that have finite amounts of threads for example this would then be communicated in his paradigm as a such of upper bounds as to how many requests one could have waiting. So just trying to get a feeling for how this could have infinite amount of waiting transactions due to partial failure and still keep taking requests. Thanks for the reply, this stuff is always interesting |
|
So, as each client can only have 1 outstanding txn at a time, this requires there are 100, 1000, 5000 etc clients all submitting the same txn at the same time, and presumably they're all connected to the same server node, and for each of those txns, the set of server nodes that need to be contacted has been calculated, and the necessary messages sent. At that point failures occur so it's not known who has received which messages, so all these txns block.
The only system resources that are held at this point is RAM. Yes, it could get so bad that you eat up a lot of RAM - this is release 0.1 after all. There are a few areas where I have tickets open to make sure that goshawkdb can spill this sort of state to disk as necessary, though it may not be too horrible just relying on swap to start with.
> Some systems handle this as a failure that is communicated to the client rapidly. Other systems let N clients actually wait indefinitely but at the cost of taking up a thread / file descriptor, etc. in systems that have finite amounts of threads for example this would then be communicated in his paradigm as a such of upper bounds as to how many requests one could have waiting.
The problem is that in this particular case, the txn could have committed, it's just the server who initially received the txn from the client and then forwarded it to the necessary server nodes can't learn that it's committed due to failures. Now certainly I could make the client informed that the outcome is delayed, but the client may not be able to do anything with that information: it can't know for sure if the txn has been committed or aborted.
The entire codebase is written using actors and finite state machines. Because Go supports green threads and go-routines can be pre-emptively suspended, there is no problem with eating OS threads. In general, the design is such that the only thing that should block is the client and on the server the actor/state-machine that is talking to that client.