Protocol Buffers

We are currently refactoring our documentation. Please excuse any problems you may find and report them here.

Interface definition languages, such as protobufs, are most commonly used to store and transmit data between applications.

They define a way to serialize structures as text, as well as deserialize it again.

Here’s an example of a protobuf message in text format:

KvPair {
  key: "TiKV"
  value: "Astronaut"
}

When a message is actually sent between two applications, a binary format is used. You can learn more about the binary format in the protobuf documentation.

The message above is an instance of a structure predefined in a .proto

message KvPair {
  string key = 1;
  string value = 2;
}

The fields are numbered to support backwards compatibility and field renaming. This makes it possible to evolve your application’s communication in a compatible way.

Why Probobufs

Protobufs are simply much faster and more efficient than things like JSON. Additionally, protobufs can generate all the required code for your desired language.

If you have used serde_json or another JSON library, you may have experienced the task of defining schemas for structures. This becomes a maintenance burden as your infrastructure grows to span many languages.

You need to do this with protocol buffers as well, but you only do it once, and the protobuf compiler will generate bindings for any language it knows how to.

Protobuf generates code in a backwards compatible manner. If an application finds unfamiliar data is isn’t familiar with, it just ignores them. This allows for a safe evolution of an API.

More than just data

Protobufs also enable the definition of services. This allows the definition of RPC calls in the *.proto files.

This example demonstrates a service called ScanService. It provides a remote procedure call Scan that accepts two strings and returns a stream of KvPairs:

service ScanService {
  rpc Scan (string, string) returns (repeated KvPair) {};
}

This is particularly useful as it allows users to call remote functions almost as if they were local thanks to code generation.

Next, we’ll use gRPC to provide these services.