Architecture

How TiKV works and how it was built

This page discusses the core concepts and architecture behind TiKV, including:

APIs

TiKV provides two APIs that you can use to interact with it:

APIDescriptionAtomicityUse when…
RawA lower-level key-value API for interacting directly with individual key-value pairs.Single keyYour application doesn’t require distributed transactions or multi-version concurrency control (MVCC)
TransactionalA higher-level key-value API that provides ACID semanticsMultiple keysYour application requires distributed transactions and/or MVCC

System architecture

The overall architecture of TiKV is illustrated in Figure 1 below:

TiKV architecture diagram
Figure 1. The architecture of TiKV

TiKV instance

The architecture of each TiKV instance is illustrated in Figure 2 below:

TiKV instance architecture diagram
Figure 2. TiKV instance architecture

Placement driver (PD)

The TiKV placement driver is the cluster manager of TiKV, which periodically checks replication constraints to balance load and data automatically across nodes and regions in a process called auto-sharding.

Store

There is a RocksDB database within each Store and it stores data into the local disk.

Region

Region is the basic unit of key-value data movement. Each Region is replicated to multiple Nodes. These multiple replicas form a Raft group.

Node

A TiKV Node is just a physical node in the cluster, which could be a virtual machine, a container, etc. Within each Node, there are one or more Stores. The data in each Store is split across multiple regions. Data is distributed across Regions using the Raft algorithm.

When a Node starts, the metadata for the Node, Store, and Region is recorded into the Placement Driver. The status of each Region and Store is regularly reported to the PD.

Transaction model

TiKV’s transaction model is similar to that of Google’s Percolator, a system built for processing updates to large data sets. Percolator uses an incremental update model in place of a batch-based model.

TiKV’s transaction model provides:

  • Snapshot isolation with lock, with semantics analogous to SELECT ... FOR UPDATE in SQL
  • Externally consistent reads and writes in distributed transactions

Raft

Data is distributed across TiKV instances via the Raft consensus algorithm, which is based on the so-called Raft paper (“In Search of an Understandable Consensus Algorithm”) from Diego Ongaro and John Ousterhout.

The origins of TiKV

TiKV was originally created by PingCAP to complement TiDB, a distributed HTAP database compatible with the MySQL protocol.