There are many approaches and tools for backing up your Kubernetes environments. These include offerings from traditional backup software vendors, newer commercial solutions tailored for containerized deployments, as well as a growing set of open source tools.
When evaluating the alternatives, here are some factors to keep in mind.
Modern data protection solutions need to cover at least these key use cases:
- Backup and recovery: keep multiple point-in-time snapshots of your configuration and data, allowing you to recover from user mistakes, accidental deletions, or ransomware attacks.
- Disaster protection and recovery: keep some of your snapshots in a form or place that can be used to recover in a different location. This protects against data center-wide disasters.
- Clone your backup copies to create test, development, or analytics instances from your production data. This allows you to extract additional business value from your backups.
To protect a Kubernetes cluster, you need a way to back up the cluster’s configuration so that you can recover from a user configuration error or widespread failure. But you also need to address:
- Application configuration: the specification templates, mappings, services, secrets, and other details that describe how the application is assembled
- Application data: the persistent state of the databases or other stores
One of the differences between container-based deployments and traditional virtual machine-based ones is that containers are more fluid – they spin up and down, and they move between nodes in a way that virtual machines do not. A particular business application may be composed of several containers, running on one or several nodes in a Kubernetes cluster.
Some traditional backup solutions focus on backing up a server or a virtual machine. In Kubernetes deployments, each server or virtual machine is usually running dozens or hundreds of applications. Backing up at a virtual machine granularity has two implications: first, it backs up more than you need, because not all of your applications require backup; second, it makes any future restores more involved because you’ll end up restoring a whole virtual machine even if you need only one or two applications.
Better are approaches that back up applications and namespaces. Namespaces are groups of applications related in some way, such as being managed by the same group. With this approach, the protection software backs up and restores applications, regardless of which node or nodes they happen to be running on at the time.
Some data protection solutions require you to run a particular abstraction layer on all your clusters. This layer can provide data management capabilities between your clusters. This is very convenient and powerful if you are willing to run the same technology stack for all of your container deployments.
Part of the benefits of the containerization approach is that you need not lock yourself into a particular stack, but retain the flexibility to mix, match, and move between technology stacks as technology evolves.
Some of the modern container-focused data protection solutions require this lock-in, while the traditional protection solutions generally do not.
Storage vendors have put a lot of effort over the last two decades into delivering fast and space-efficient snapshot copying and replication capabilities. This means that the storage layer keeps track of which blocks of data have changed so that, even if you keep multiple snapshots, only the unique blocks need to take up space in the persistent storage. Storage systems use the same differencing and deduplication techniques to keep remote copies of data without transferring all of the data with each update.
Data protection approaches that leverage the underlying storage system capabilities can
- Back up faster, because storage system snapshots don’t need to copy any data
- Consume less space, because the storage system maintains only the unique blocks
- Use less network bandwidth, because only changes need to be copied to remote destinations
Traditional data protection solutions are often strong in this area, and the providers have deep technical relationships with the storage vendors to maintain and leverage these efficiencies. Solutions that do not leverage storage capabilities, but instead treat storage generically, use significantly more time, space, and network bandwidth, all of which add to your cost.
Ease of Use
You can’t afford to hire experts to keep your data protection running. The best approaches are ones that deliver powerful capabilities through UIs and APIs that are easy for even junior staff to understand and operate. Look for data protection software that gives you enough control to satisfy your business requirements for recovery times and retention, with enough automation to allow your administrators to be confident they’ve correctly implemented the requirements.