Overview

Core concepts and architecture behind TiKV.

This page provides an overview of the TiKV architecture.

System architecture

The overall architecture of TiKV is as follows:

TiKV architecture diagram
The architecture of TiKV

A TiKV cluster consists of the following components:

TiKV clients interact with PD and TiKV through gRPC.

TiKV

TiKV stores data in RocksDB, which is a persistent and fast key-value store. To learn why TiKV selects RocksDB to store data, see RocksDB.

Implementing the Raft consensus algorithm, TiKV works as follows:

  • TiKV replicates data to multiple machines, ensures data consistency, and tolerates machine failures.
  • TiKV data is written through the interface of Raft instead of directly to RocksDB.
  • TiKV becomes a distributed Key-Value storage, which can automatically recover lost replicas in case of machine failures and keep the applications unaffected.

Based on the Raft layer, TiKV provides two APIs that clients can interact with:

APIDescriptionAtomicityUsage scenarios
RawA lower-level key-value API to interact directly with individual key-value pairsSingle keyYour application requires low latency and does not involve multi-key transactions.
TransactionalA higher-level key-value API to provide snapshot isolation transactionMultiple keysYour application requires distributed transactions.
ConceptDescription
Raft GroupEach replica of a region is called Peer. All of such peers form a raft group.
LeaderIn every raft group, there is a unique role called leader, who is responsible for processing read or write requests from clients.

PD

As the manager in a TiKV cluster, the Placement Driver (PD) provides the following functions:

  • Timestamp oracle

    Timestamp oracle plays a significant role in the Percolator transaction model. PD implements a service to hand out timestamps in the strictly increasing order, which is a property required for the correct operations of the snapshot isolation protocol.

  • Region scheduler

    Data in TiKV is organized as Regions, which are replicated to several stores. PD, as the Region scheduler, decides the storage location of each replica.