Smooth Operator

Jun 1, 2025 · 960 words · 5 minute read

Let's build a simple Kubernetes operator

Kubernetes Operators 🔗

Kubernetes changed how we deploy applications. It automated many tasks that developers handled manually. But some operations still required human expertise. Database backups, complex upgrades, and application-specific configurations needed careful attention.

Operators solved this problem. They captured human operational knowledge and automated it.

What Are Kubernetes Operators? 🔗

An operator is a software extension that manages applications on Kubernetes. It uses custom resources to define how applications should behave. The operator watches these resources and takes action when changes occur.

Think of operators as specialized controllers. They understand specific applications deeply. A database operator knows how to handle backups, upgrades, and failovers. A monitoring operator understands how to scale collectors and configure dashboards.

The Human Touch, Automated 🔗

Human operators possessed crucial knowledge. They knew when to scale applications, how to handle failures, and which configurations worked best. This knowledge lived in runbooks, documentation, and experience.

Operators captured this expertise in code. They encoded best practices into automated workflows. The result was consistent, reliable operations without human intervention.

How Operators Work 🔗

Operators follow Kubernetes’ control loop pattern. They continuously observe the desired state and adjust reality to match it.

Here’s how this works:

Observe: The operator watches custom resources for changes
Analyze: It compares the desired state with current reality
Act: It takes steps to reconcile any differences
Repeat: The cycle continues indefinitely

Operator

This approach ensures applications maintain their desired state automatically.

A Real-World Example 🔗

Consider a database operator managing a PostgreSQL cluster. When you create a database resource, the operator springs into action:

It provisions persistent storage for data
It creates and runs database pods
It configures initial database settings
It sets up automated backups
It monitors health and handles failures
It handles updates

For further reference and reading see, for example, the Zalando postgres-operator.

If you delete the database resource, the operator performs cleanup. It takes a final backup, removes the respective pods, and cleans up storage resources.

Throughout the database’s lifetime, the operator manages upgrades, scales replicas, and handles routine maintenance. It performs these tasks consistently, following proven procedures.

Beyond Basic Automation 🔗

Operators exceeded simple deployment automation. They handled complex scenarios that required deep application knowledge:

Intelligent scaling: A Redis operator might redistribute data when adding nodes
Upgrade orchestration: A Kafka operator could perform rolling upgrades while maintaining partition leadership
Disaster recovery: A backup operator might restore data from multiple sources in the correct order
Performance tuning: A monitoring operator could adjust collection intervals based on cluster load

These capabilities transformed how teams managed complex applications.

The Ecosystem Today 🔗

The operator ecosystem flourished. Popular operators emerged for databases, monitoring systems, service meshes, and CI/CD tools. Companies built operators for their proprietary applications. The Operator Framework made it easy to create your own operator.

This sounds all a bit complicated. So why not start a simple operator to get into how an operator actually works?

Let’s Build An Operator 🔗

Ultimately, Operators are “just” actual programs that run in the cluster. They interact through Kubernetes APIs to automate more complex functions. Operators are usually written in golang. But in one of my projects the team was determined to use Java. It was widely adopted within the company and hardly anyone was fluent in go. There are many examples for go operators, but not so many for Java. Let’s give it a try.

Getting Started 🔗

Here is what we need to follow the example:

Docker
Kubernetes in Docker - KIND
kubectl CLI
Java SDK 21+
The actual operator and CRD + CR GitHub

CRD At Heart 🔗

At the heart of an operator is the reconcile function. As described above, it watches for changes of our custom resource and undertakes the necessary steps to reconcile our resource to the desired state. Usually the status of our custom resource reflects the result of the reconciliation process. I.e. is the resource ready to use? If not, what went wrong? It’s the operators responsibility to make this transparent to the user. For example by updating the status of our custom resource, logging errors, writing Kubernetes events etc.

A Whimsical Example 🔗

The Luggage Operator manages a custom Kubernetes resource called “Luggage” – inspired by Rincewind’s infamous traveling chest that follows him through dimensions, eats his enemies, and stores impossibly large amounts of stuff in a space that shouldn’t fit. While this might be a silly example, it demonstrates how an operator would handle the reconciliation. The heart of the operator lives in LuggageReconciler.java. When a Luggage resource is created or updated, the reconciler:

Logs the event
Calls the LuggageService to determine current status
Updates the resource status using UpdateControl.patchStatus()

While the LuggageService has no “real” functionality, it’s easy to imagine how an actual service might look into the current state, check for required resources and takes steps to reach the desired state.

But Why 🔗

Beyond the humor, this operator demonstrates several concepts:

Rich Custom Resources: The CRD includes complex nested objects, enums, and validation rules that showcase Kubernetes’ extensibility.
Proper Status Management: The operator correctly uses the status subresource with conditions, phases, and detailed state tracking.
Spring Boot Integration: Shows how to build operators with familiar Spring patterns rather than raw Kubernetes client libraries.

Conclusion 🔗

Operators represented a shift in operational thinking. Instead of reacting to problems, they prevented them. Instead of following manual procedures, they automated expertise.

This automation delivered several benefits:

Consistency: Operators performed tasks the same way every time
Reliability: They reduced human error and operational drift
Scalability: Teams could manage more applications with fewer people
Speed: Automated operations responded faster than manual processes

Using operators required minimal changes to existing workflows. Teams defined their desired state using custom resources. The operator handled the implementation details.

This approach felt natural to Kubernetes users. It extended the familiar declarative model to complex applications.

engineering

java

kubernetes