Kubernetes Operators ๐
Kubernetes changed how we deploy applications. It automated many tasks that developers handled manually. But some operations still required human expertise. Database backups, complex upgrades, and application-specific configurations needed careful attention.
Operators solved this problem. They captured human operational knowledge and automated it.
What Are Kubernetes Operators? ๐
An operator is a software extension that manages applications on Kubernetes. It uses custom resources to define how applications should behave. The operator watches these resources and takes action when changes occur.
Think of operators as specialized controllers. They understand specific applications deeply. A database operator knows how to handle backups, upgrades, and failovers. A monitoring operator understands how to scale collectors and configure dashboards.
The Human Touch, Automated ๐
Human operators possessed crucial knowledge. They knew when to scale applications, how to handle failures, and which configurations worked best. This knowledge lived in runbooks, documentation, and experience.
Operators captured this expertise in code. They encoded best practices into automated workflows. The result was consistent, reliable operations without human intervention.
How Operators Work ๐
Operators follow Kubernetes’ control loop pattern. They continuously observe the desired state and adjust reality to match it.
Here’s how this works:
- Observe: The operator watches custom resources for changes
- Analyze: It compares the desired state with current reality
- Act: It takes steps to reconcile any differences
- Repeat: The cycle continues indefinitely
A Real-World Example ๐
Consider a database operator managing a PostgreSQL cluster. When you create a database resource, the operator springs into action:
- It provisions persistent storage for data
- It creates and runs database pods
- It configures initial database settings
- It sets up automated backups
- It monitors health and handles failures
- It handles updates
For further reference and reading see, for example, the Zalando postgres-operator.
If you delete the database resource, the operator performs cleanup. It takes a final backup, removes the respective pods, and cleans up storage resources.
Throughout the database’s lifetime, the operator manages upgrades, scales replicas, and handles routine maintenance. It performs these tasks consistently, following proven procedures.
Beyond Basic Automation ๐
Operators exceeded simple deployment automation. They handled complex scenarios that required deep application knowledge:
- Intelligent scaling: A Redis operator might redistribute data when adding nodes
- Upgrade orchestration: A Kafka operator could perform rolling upgrades while maintaining partition leadership
- Disaster recovery: A backup operator might restore data from multiple sources in the correct order
- Performance tuning: A monitoring operator could adjust collection intervals based on cluster load
These capabilities transformed how teams managed complex applications.
The Ecosystem Today ๐
The operator ecosystem flourished. Popular operators emerged for databases, monitoring systems, service meshes, and CI/CD tools. Companies built operators for their proprietary applications. The Operator Framework made it easy to create your own operator.
This sounds all a bit complicated. So why not start a simple operator to get into how an operator actually works?
Let’s Build An Operator ๐
Ultimately, Operators are “just” actual programs that run in the cluster. They interact through Kubernetes APIs to automate more complex functions. Operators are usually written in golang. But in one of my projects the team was determined to use Java. It was widely adopted within the company and hardly anyone was fluent in go. There are many examples for go operators, but not so many for Java. Let’s give it a try.
Getting Started ๐
Here is what we need to follow the example:
- Docker
- Kubernetes in Docker - KIND
- kubectl CLI
- Java SDK 21+
- The actual operator and CRD + CR GitHub
CRD At Heart ๐
At the heart of an operator is the reconcile function. As described above, it watches for changes of our custom resource and undertakes the necessary steps to reconcile our resource to the desired state. Usually the status of our custom resource reflects the result of the reconciliation process. I.e. is the resource ready to use? If not, what went wrong? It’s the operators responsibility to make this transparent to the user. For example by updating the status of our custom resource, logging errors, writing Kubernetes events etc.
A Whimsical Example ๐
The Luggage Operator manages a custom Kubernetes resource called “Luggage” โ inspired by Rincewind’s infamous traveling
chest that follows him through dimensions, eats his enemies, and stores impossibly large amounts of stuff in a space
that shouldn’t fit. While this might be a silly example, it demonstrates how an operator would handle the
reconciliation.
The heart of the operator lives in LuggageReconciler.java
. When a Luggage resource is created or updated, the
reconciler:
- Logs the event
- Calls the LuggageService to determine current status
- Updates the resource status using
UpdateControl.patchStatus()
While the LuggageService has no “real” functionality, it’s easy to imagine how an actual service might look into the current state, check for required resources and takes steps to reach the desired state.
But Why ๐
Beyond the humor, this operator demonstrates several concepts:
- Rich Custom Resources: The CRD includes complex nested objects, enums, and validation rules that showcase Kubernetes’ extensibility.
- Proper Status Management: The operator correctly uses the status subresource with conditions, phases, and detailed state tracking.
- Spring Boot Integration: Shows how to build operators with familiar Spring patterns rather than raw Kubernetes client libraries.
Conclusion ๐
Operators represented a shift in operational thinking. Instead of reacting to problems, they prevented them. Instead of following manual procedures, they automated expertise.
This automation delivered several benefits:
- Consistency: Operators performed tasks the same way every time
- Reliability: They reduced human error and operational drift
- Scalability: Teams could manage more applications with fewer people
- Speed: Automated operations responded faster than manual processes
Using operators required minimal changes to existing workflows. Teams defined their desired state using custom resources. The operator handled the implementation details.
This approach felt natural to Kubernetes users. It extended the familiar declarative model to complex applications.