Visualization of RAMP transactions
I've just published my post about @pbailis's RAMP transactions with step-by-step visualization - https://t.co/SK5k36mFyT
— Rystsov Denis (@rystsov) April 12, 2016
I've just published my post about @pbailis's RAMP transactions with step-by-step visualization - https://t.co/SK5k36mFyT
— Rystsov Denis (@rystsov) April 12, 2016
Paper | Scalable Atomic Visibility with RAMP Transactions (2014) |
---|---|
Isolation level | Atomicity + Read Committed |
Roundtrips per commit | 2 (3 with latency improvements) |
Visualization | Step-by-step visualization of RAMP transactions |
The algorithm is new and hasn't been noticed in the wild but there there are rumors that Cassandra may support it; Facebook also reported that they are working on the Apollo database which uses Paxos for replication and CRDT & RAMP for cross shard transactions.
RAMP algorithm solves the problem of simultaneous (atomic) update of set of items distributed between multiple servers (shards).
The selected steps are not described in the paper but they are necessary if we want to avoid a couple of anomalies. One of them happens when a client reads two items in a transaction, receives new version but then she starts new transaction, reads only one element and receives old version (stale read).
I believe those steps were omitted in the original paper because it was supposed that read transactions should always read the same set of items that were participating in the write transactions. In practice, the latter way has latency penalty because if the items are distributed between different shards and read transaction should always contact all of them.
One of the hardest part of understanding RAMP transactions to me was to come up with an example of the domain where anomalies of the RAMP transactions (lost updates) are explainable and fine from the customer perspective.
The good example is an application like Splitwise which allows customers to split bills between friends and to track how much they own to each other. We can model the domain as a directed weighted graph where nodes represent customers, edges represent debts and distribute it by distributing the nodes with its outgoing edges. With such approach a customer can get a list of its debts by querying only one shard.
Of cause money is classical example where we need serializability, but our application is rather a reminder than bank, so we can relax the guaranties. For example, customers may understand that one of the simultaneous updates to the same data may be canceled, but they can't tolerate inconsistency between views of two customer on how much they own to each other. This is exactly the same guaranties which RAMP transactions provide.
In the step-by-step RAMP visualization I modeled a situation between customers who named just like some of the mathematicians I deeply admire: Evariste Galois, Euclid and Kurt Godel. They take a cab and go to a restaurant, Euclid pays for a ride, Godel handles the restaurant's check and at some point of time they decided to reflect this into the app.
The RAMP paper described three flavors of RAMP transaction each with different runtime characteristics: RAMP-fast, RAMP-small and RAMP-hybrid. This post covers the first one, but the other are very similar, so if you got RAMP-fast I'm sure you'll get the other without problems.
Further reading: