This Month in TiKV - November 2019

Ana Hobden

December 11, 2019


As winter descends on the northern hemisphere, we encourage you to keep warm by compiling TiKV! In November we had new minor releases, a new crate release, and some community updates!

Let’s take a look!

Releases

This month our team made 2 minor TiKV releases!

You can review the changelogs here:

Upgrading? Things to note:

  • In 3.0.6, #5697 introduced the ability to generate flamegraphs of TiKV at runtime! This means the new pprof crate has reached stable!
  • In 3.0.6, #5769 introduced the ability to modify GC IO limits dynamically with tikv-ctl.
  • In 3.0.6, #5440 there are new metrics around commit logs duration.

The other changes were minor bugfixes.

New Community Groups

After several discussions, our community has voted to adopt policies to support the founding of Special Interest Groups (SIGS) and Working Groups (WGs). Long-term SIGs are intended to focus on certain components of TiKV; WGs will focus on short-term, cross-component projects or goals.

@breeswish and @lonng founded a coprocessor SIG (named copr). Information about their group is here. @zhangjinpeng1987, @yiwu-arbug, and @sunxiaoguang founded an Engine SIG. You can review their group here.

For WGs, the first WG is the multiple-dc-enhancement group organized by @hicqu to better support deployments spanning multiple datacenters and regions.

pprof available

@Yangkeo released pprof! This allows you to profile and collect reports from a running Rust program. It includes a flamegraph feature that TiKV now uses.

let guard = pprof::ProfilerGuard::new(100).unwrap();
// ...
if let Ok(report) = guard.report().build() {
    let file = File::create("flamegraph.svg").unwrap();
    report.flamegraph(file).unwrap();
};
Generated flamegraph

Reading materials

Notable PRs

November was a busy month. Here are some highlights:

  • @nrc submitted a great refactoring PR in #5857, #5935 and #5964, helping us make TiKV more clean and understanable.
  • @little-wallace submitted a PR to optimize the conflict check in prewrite in #5846.
  • @little-wallace also submitted a latches scheduler optimization in [#6094]https://github.com/tikv/tikv/pull/6094).
  • Numerous vectorization PRs were submitted, enabling even faster coprocessor execution.
  • @wangwangwar introduced in-place byte encoding in the codec, helping reduce the number of allocations needed in #6061
  • @brson moved more code into engine traits as part of our ongoing engine abstraction in #5790 and #5901.
  • @breeswish taught TiKV about WriteRef which helps avoid allocations in PointGetter and ForwardScanner.
  • @sticnarf opened a PR to unify our read pools into a single multi-level thread pool in #5828. Work is still early stage, but this PR is very exciting!
  • @hoverbear implemented a suggestion from @siddontang to reduce our binary sizes while maintaining our backtraces and debugging info in #5820.
  • @hunterlxt taught TiKV to use GRPC’s memory quota features in #5818.
  • @glorv and @hoverbear are working on packaging TiKV.

Notable issues

TLA+ was firstly used by PingCAP in 2017 to verify the feasibility of a variant of the Percolator transaction model. Different from the original Percolator, TiDB’s model has improved efficiency greatly by supporting primary key and secondary key prewriting concurrently in the prewrite phase. The implementation was problematic at first. Thanks to TLA+, the root cause was clearly identified and targeted changes were applied to enable this optimization. Another application of TLA+ at PingCAP is the correctness verification of the multi-raft region merge algorithm, which provided the necessary confidence before we implemented the feature. There were only 4 members invested in the TLA+ related work, but the rewards are far beyond that.

Engineers at PingCAP believe that TLA+ is necessary when it comes to verifying the correctness of distributed systems and specifying the right system behaviors to avoid future patching. They agree that TLA+ is better at expressing the subtle details which natural languages are not good at.

PingCAP has maintained a repo in Github that records all specifications they wrote and keeps aligning TLA+ specifications with the newly implemented database optimizations. Up to now, there’re still active PRs and discussions on verifying new features and bugfixes. Speeches and posts on technical reflections and experience of TLA+ application have been shared with the infra community by PingCAP.

@lance6716 made a great summary about how TiKV uses TLA+ in this comment in #5784, the top discussed issue in November.

@rleungx and @wangwangwar discussed how we can record system events and DTrace probes in %3799. Monitoring and diagnostics provide critical information to operators to help them undderstand how best to scale TiKV, prevent storage exhaustion, and detect potential network issues.

Current projects

Here’s some of the things our contributors have been working on over the last month:

  • @winkyao has been coordinating with @dcalvin, @gingerkidney, and @hoverbear to help prepare TiKV’s community structures and processes for future growth. We’d love feedback from contributors! You can see some of the recent changes in tikv/community.
  • @brson has been pressing forward on Engine trait abstraction. Track progress on #engine-trait on our chat!
  • @Yangkeo been working on pprof.
  • @overvenus is continuing work on building TiKV’s backup & restore feature.
  • @yiwu-arbug has been working to improve our RocksDB bindings as well as Titan.
  • The copr SIG is working on new codecs and more function vectorization.

If any of these projects sound like something you’d like to contribute to, let us know on our chat and we’ll try to help you get involved.

New contributors

We’d like to welcome the following new contributors to TiKV and thank them for their work!

If you’d like to get involved, we’d love to help you get started. You might be interested in tackling one of these issues. If you don’t know how to begin, please leave a comment and somebody will help you out. We’re also very keen for people to contribute documentation, tests, optimizations, benchmarks, refactoring, or other useful things.

This Week in TiDB

For more detailed and comprehensive information about TiDB and TiKV, we have weekly updates. The following cover November: