# Distributed Algorithms

We are currently refactoring our documentation. Please excuse any problems you may find and report them here.

## Two-Phase Commit

In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction, determining whether to commit or abort (rollback) the transaction. It is a specialized type of consensus protocol. The protocol achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure scenarios, and in rare cases user (i.e. a system’s administrator) intervention is needed to resolve failures. To aide in recovery from failure the protocol’s participants log the protocol’s states. Log records, which are typically slow to generate but survive failures, are used by the protocol’s recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though expected to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.

### Basic Algorithm of 2PC

#### prepare phase

The coordinator sends a prepare message to all cohorts and waits until it has received a reply from all cohorts.

#### commit phase

If the coordinator received an agreement message from all cohorts during the prepare phase, the coordinator sends a commit message to all the cohorts.

If any cohort votes No during the prepare phase (or the coordinator’s timeout expires), the coordinator sends a rollback message to all the cohorts.

The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions: after a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received.

For example, consider a transaction involving a coordinator A and the cohort C1. If C1 receives a prepare message and responds to A, then A fails before sending C1 either a commit or rollback message, then C1 will block forever.

### 2PC Practice in TiKV

In TiKV we adopt the Percolator transaction model which is a variant of two phase commit. To address the disadvantage of coordinator failures, percolator doesn’t use any node as coordinator, instead it uses one of the keys involved in each transaction as a coordinator. We call the coordinating key the primary key, and the other keys secondary keys. Since each key has multiple replicas, and data is kept consistent between these replicas by using a consensus protocol (Raft in TiKV), one node’s failure doesn’t affect the accessibility of data. So Percolator can tolerate node fails permanently.

## Three-Phase Commit

Unlike the two-phase commit protocol (2PC), 3PC is non-blocking. Specifically, 3PC places an upper bound on the amount of time required before a transaction either commits or aborts. This property ensures that if a given transaction is attempting to commit via 3PC and holds some resource locks, it will release the locks after the timeout.

#### 1st phase

The coordinator receives a transaction request. If there is a failure at this point, the coordinator aborts the transaction. Otherwise, the coordinator sends a canCommit? message to the cohorts and moves to the waiting state.

#### 2nd phase

If there is a failure, timeout, or if the coordinator receives a No message in the waiting state, the coordinator aborts the transaction and sends an abort message to all cohorts. Otherwise the coordinator will receive Yes messages from all cohorts within the time window, so it sends preCommit messages to all cohorts and moves to the prepared state.

#### 3rd phase

If the coordinator succeeds in the prepared state, it will move to the commit state. However if the coordinator times out while waiting for an acknowledgement from a cohort, it will abort the transaction. In the case where an acknowledgement is received from the majority of cohorts, the coordinator moves to the commit state as well.

A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the Commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond.

The three-phase commit protocol eliminates this problem by introducing the Prepared-to-commit state. If the coordinator fails before sending preCommit messages, the cohort will unanimously agree that the operation was aborted. The coordinator will not send out a doCommit message until all cohort members have acknowledged that they are Prepared-to-commit. This eliminates the possibility that any cohort member actually completed the transaction before all cohort members were aware of the decision to do so (an ambiguity that necessitated indefinite blocking in the two-phase commit protocol).