So here I’m going to go through backing up etcd on a kubernetes cluster.
The reason I’ve done this before is for the Kubernetes Certified Administrator (CKA) exam where you would need to know how to do this. In managed clusters like GKE then you probably will never need to interact with etcd.
There’s a few steps involved in Backup:
- save the snapshot Restore:
- stop all cluster components
- restore the snapshot
- restart all cluster components
Setup
The first thing you’ll want to do before all this is ssh
onto the master node of the cluster. The master node is where things like etcd
are configured.
We’re going to be using etcdctl
to back up etcd in this tutorial. You can make sure it’s installed by running etcdctl version
. This will return the etcdctl
version and the API version which should be the same. If this command does not work then you can either install etcdctl or even see if etcd
is even running on the cluster by seeing if there’s any etcd
pods running
Authentication
Another thing you may need to do before continuing on is you may need to authenticate with the cluster. If you find the etcdctl
commands hanging or not completing then this is probably the reason. One way of authenticating is by passing in the certs used for etcd. These are usually stored on the master node and you can find the location by looking at the etcd.yaml
file. So for example
> cat /etc/kubernetes/manifests/etcd.yaml
...
spec:
containers:
- command:
- etcd
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --data-dir=/var/lib/etcd
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
...
The etcd.yaml
file describes the etcd pod. The command arguments passed to etcd when it is started are shown here. There’s a bunch of other arguments I’ve omitted but the main ones are here. We need 3 of these to authenticate etcdctl
with the API:
etcdctl --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}
Now we’ve got enough for the etcdctl
commands to work.
Save the Snapshot
Saving the snapshot is done via snapshot save
. This command takes a filename
which is where the snapshot will be stored.
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}
This creates a backup in the /tmp/etcd-backup.db
file
Stop all Cluster Components
To stop all cluster components just move the yaml files out of the /etc/kubernetes/manifests
folder which is where all static pod yaml files are stored. Once they are removed we can watch the pods disappear using watch crictl ps
.
mv /etc/kubernetes/manifests /etc/kubernets/tmp
Once the cluster components are stopped we can restore etcd
Restore the Snapshot
Use snapshot restore
to restore the backup. In this example we’re going to restore etcd to a new location. etcd.yaml
defines the --data-dir=/var/lib/etcd
which is where etcd stores all data. We can either overwrite this entirely or store in a new folder
ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db --data-dir=/var/lib/etcd-backup --cacert={--trusted-ca-file} --cert={--cert-file} --key={--key-file}
However, now that etcd data is in a new location we need to tell etcd where this is. We do this by editing the etcd.yaml
to point at the new location. If you overwrote the etcd data in the previous step then this is not necessary.
> vim /etc/kubernetes/tmp
...
- --data-dir=/var/lib/etcd-backup # changed
...
Restart Cluster Components
Now we can restart the cluster components and watch them come back up
> mv /etc/kubernetes/tmp /etc/kubernets/manifests
> watch crictl ps
Conclusion
We’ve successfully backed up and restored etcd in a kubernetes cluster.