Visualization of RAMP transactions

Paper Scalable Atomic Visibility with RAMP Transactions (2014)
Isolation level Atomicity + Read Committed
Roundtrips per commit 2 (3 with latency improvements)
Visualization Step-by-step visualization of RAMP transactions

Usage in the wild

The algorithm is new and hasn't been noticed in the wild but there there are rumors that Cassandra may support it; Facebook also reported that they are working on the Apollo database which uses Paxos for replication and CRDT & RAMP for cross shard transactions.

Problem

RAMP algorithm solves the problem of simultaneous (atomic) update of set of items distributed between multiple servers (shards).

Overview

How to change a set of items

  • For each change of the item send it to shard owning the current version of the item along with the names of all items participating in the transaction (metadata).
  • For each change connect to the its shard and mark the change as committed (if id of the current transaction is greater than id of the previous change).
  • Connect to the shards and mark the commit as confirmed.

How to read a set of items

  • Read the current committed version of the items.
  • If there is a non-confirmed item:
    • (re)-commit the whole transaction which is responsible for the current item (we can do it since we store metadata within each item)
    • confirm the commit
  • Use the metadata to calculate the expected version of the value
  • Fetch the expected version if it's greater than the committed version

The selected steps are not described in the paper but they are necessary if we want to avoid a couple of anomalies. One of them happens when a client reads two items in a transaction, receives new version but then she starts new transaction, reads only one element and receives old version (stale read).

I believe those steps were omitted in the original paper because it was supposed that read transactions should always read the same set of items that were participating in the write transactions. In practice, the latter way has latency penalty because if the items are distributed between different shards and read transaction should always contact all of them.

Example

One of the hardest part of understanding RAMP transactions to me was to come up with an example of the domain where anomalies of the RAMP transactions (lost updates) are explainable and fine from the customer perspective.

The good example is an application like Splitwise which allows customers to split bills between friends and to track how much they own to each other. We can model the domain as a directed weighted graph where nodes represent customers, edges represent debts and distribute it by distributing the nodes with its outgoing edges. With such approach a customer can get a list of its debts by querying only one shard.

Of cause money is classical example where we need serializability, but our application is rather a reminder than bank, so we can relax the guaranties. For example, customers may understand that one of the simultaneous updates to the same data may be canceled, but they can't tolerate inconsistency between views of two customer on how much they own to each other. This is exactly the same guaranties which RAMP transactions provide.

In the step-by-step RAMP visualization I modeled a situation between customers who named just like some of the mathematicians I deeply admire: Evariste Galois, Euclid and Kurt Godel. They take a cab and go to a restaurant, Euclid pays for a ride, Godel handles the restaurant's check and at some point of time they decided to reflect this into the app.

Postscriptum

The RAMP paper described three flavors of RAMP transaction each with different runtime characteristics: RAMP-fast, RAMP-small and RAMP-hybrid. This post covers the first one, but the other are very similar, so if you got RAMP-fast I'm sure you'll get the other without problems.

Further reading:

Lobsters submission