So here I’m going to go through backing up etcd on a kubernetes cluster.

The reason I’ve done this before is for the Kubernetes Certified Administrator (CKA) exam where you would need to know how to do this. In managed clusters like GKE then you probably will never need to interact with etcd.

There’s a few steps involved in Backup:

  1. save the snapshot Restore:
  2. stop all cluster components
  3. restore the snapshot
  4. restart all cluster components

Setup

The first thing you’ll want to do before all this is ssh onto the master node of the cluster. The master node is where things like etcd are configured.

We’re going to be using etcdctl to back up etcd in this tutorial. You can make sure it’s installed by running etcdctl version. This will return the etcdctl version and the API version which should be the same. If this command does not work then you can either install etcdctl or even see if etcd is even running on the cluster by seeing if there’s any etcd pods running

Authentication

Another thing you may need to do before continuing on is you may need to authenticate with the cluster. If you find the etcdctl commands hanging or not completing then this is probably the reason. One way of authenticating is by passing in the certs used for etcd. These are usually stored on the master node and you can find the location by looking at the etcd.yaml file. So for example

> cat /etc/kubernetes/manifests/etcd.yaml

...
spec:
  containers:
  - command:
    - etcd
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --data-dir=/var/lib/etcd
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
...

The etcd.yaml file describes the etcd pod. The command arguments passed to etcd when it is started are shown here. There’s a bunch of other arguments I’ve omitted but the main ones are here. We need 3 of these to authenticate etcdctl with the API:

etcdctl --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}

Now we’ve got enough for the etcdctl commands to work.

Save the Snapshot

Saving the snapshot is done via snapshot save. This command takes a filename which is where the snapshot will be stored.

ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}

This creates a backup in the /tmp/etcd-backup.db file

Stop all Cluster Components

To stop all cluster components just move the yaml files out of the /etc/kubernetes/manifests folder which is where all static pod yaml files are stored. Once they are removed we can watch the pods disappear using watch crictl ps.

mv /etc/kubernetes/manifests /etc/kubernets/tmp

Once the cluster components are stopped we can restore etcd

Restore the Snapshot

Use snapshot restore to restore the backup. In this example we’re going to restore etcd to a new location. etcd.yaml defines the --data-dir=/var/lib/etcd which is where etcd stores all data. We can either overwrite this entirely or store in a new folder

ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db --data-dir=/var/lib/etcd-backup --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}

However, now that etcd data is in a new location we need to tell etcd where this is. We do this by editing the etcd.yaml to point at the new location. If you overwrote the etcd data in the previous step then this is not necessary.

> vim /etc/kubernetes/tmp
...
    - --data-dir=/var/lib/etcd-backup # changed
...

Restart Cluster Components

Now we can restart the cluster components and watch them come back up

> mv /etc/kubernetes/tmp /etc/kubernets/manifests
> watch crictl ps

Conclusion

We’ve successfully backed up and restored etcd in a kubernetes cluster.