Of vaults and secret operators
Well, I got around to it. I finally wove kind, a key vault and the External-Secrets and SealedSecrets operators together to provide development environments that better reflect our production deployment patterns. And not just that, but I can finally meet the promise to myself I’d write an article about external secrets and k8s having glanced off of the subject over a year ago.
Previously, I’d worked with all these tools in isolation. And I’ve been using kind to debug deployments locally for some time now. But a recent focus on security and secret-handling from my employer motivated me to reevaluate our practices and look again at how these tools could come together to increase security and parity between development and production environments.
Motivations & Reservations
Having been working in a software engineering department with widely varied practices across multiple on-prem and hybrid cloud environments and deployment methods, some operating off of GitOps principles, some very much manually maintained, I had become aware I was increasingly reluctant to address the growing need to push for development on Kubernetes. One more methodology, one more tool to install, one more set of commands, one more moving part in the mechanism for devs to learn where once there was just IntelliJ and ‘gradle build’. But all the same - if your application is ultimately set to run on Kubernetes, don’t you want greater parity between production and development? Shift-left and all that. I felt the conceptual need to enable development on local Kubernetes, but the practical implementation and benefits would have to justify the implicit overhead.
The development part of container-based application’s life cycle is at best an unintentionally missed opportunity and at worst, an intentionally ignored problem.
The most common approach I see is rolling out a Docker Compose file to ‘more or less’ replicate how the application is served in production. But this is an abstraction and abstractions introduce complexity. This abstraction may mislead you about the functionality of your application in its runtime.
Now, will any local development cluster be 100% accurate to that of your production? Unlikely, I would say. There’s most likely a managed layer to your production environment, perhaps handled by another entity altogether, with production or enterprise necessities such as CRDs, Operators, roles and network rules that won’t be applicable to your little simulacrum. But you want to get it close. Close enough for most of the differences to be obfuscated away from them being glaring and high level operations like deployments to be un. And that’s what secret management was for me, glaring.
Now, if I’m following this overly-principled logic - if the production cluster is getting secrets from an external platform, shouldn’t our local development cluster? We’re adjudicating the tradeoff between parity and overhead here. What is your threshold on similarity between development and production that you’re willing to accept the load of implementation. For me, one such element was integration of our key management platform.
One more factor to consider within this context was our GitOps operational model. I love the concept behind GitOps and in how we have implemented it, there’s a lot I love about it in practice too. But there’s a risk you can run into with it, where if your coverage of your applications’ deployment configuration isn’t represented as code, it becomes an anti-pattern: misleading the reader into thinking it is entirely, completely representative. For now I’ll deem this a ‘shoelace antipattern’ - where only an complete adoption of a methodology provides value, and an incomplete adoption leaves you worse off than if you had never adopted it to begin with. Thats an opinion I’ve come to form, and could enjoy an argument over but for now I think if you adopt GitOps practices, you need to make sure as much of your deployment is captured in code as possible, or you’ll end up misleading people (or yourself, given enough time).
This is specifically relevant here because of Secrets. You could arguably seal all your secrets and store them seperately as vanilla encrypted Secrets but this really only works for static clusters - ie. if your environments are seperated on a cluster level as opposed to namespaced, you would seal each set of secrets against each cluster’s SealedSecrets service and store them categorized per DEV, STG, PRD, etc. But what about when do dynamic, ephemeral environments come into then? The alternative is to seal one key - the key to your vault and package that in your repo. Your vault, the External-Secrets operator and the Sealed Secrets Operators come together to enable this. I’d generally avoid anything with the Bitnami label since their heel turn but for now, we’re operating under faith. As of the time of writing, I’m looking at alternative ways to host encrypted secrets in more or less plain view but for now and for the sake of this proof of concept and article, let’s just assume it’s SealedSecrets.
Implementation
As with most things in my life, the greatest challenges during this journey involved certs and corporate networks. On a home setup where you have full control of your resources, the subject wouldn’t even justify a write up. Create a local cluster, install two operators, seal a secret, create a couple of Kubernetes secret objects and Bobs your uncle, you’re up and running. But as many know, achieving seemingly simple things on high-security networks becomes impossible and you are forced to come up with workarounds. helm install needs firm handholding. Operators fetching images need assistance. These problems were at times, fun to circumvent. They bettered my understanding of how some of these tools work behind the scenes. I’m sure most people never need to know such things. What a privilege.
So with the understanding this is just list of random notable events:
Right out the gate I had issues importing the External-Secrets operator into my local cluster. To my surprise, it was the CRD import that was failing. My cluster appeared unable to create the new definitions needed, which was odd. Silly mistake - turned out I was running an ancient k8s version as my kind node image. Of course kind had issues pulling the new image from its Docker Hub repository, so I manually pulled one through our internal DH mirror and created a new cluster using
kind create cluster --image .... Once up to v1.34.0, I was able to proceed and External-Secrets could be installed without complaints.Even though Helm has finished up installed ESO and wished me a happy Helming, the deployed resources were throwing startup errors. A glance at
k9sshowed the pods were struggling to pull their images. My one hundreth surprise at Helm’s functionality; that there does not appear to be a verification of rollout by Helm before it cheerily reports its success. Again, this was down to the charts pointing towards external image registries. I manually pulled down the images the charts required and manually loaded them in to the cluster usingkind load docker-image ...and shortly thereafter the pods came to life and I was running ESO.The same process again occured with SealedSecrets - a manual pull/push had to go through but otherwise no issues here.
I started importing the k8s objects that would be define my secrets. I first import the SecretStore manifest. More to the topic of the operator’s behaviour - the SecretStore will report a status of ‘valid’ but I’m not sure exactly what this is validating as I’m pretty sure I provided incorrect authentication and it still reported valid. Perhaps relevant however - I noticed this was the object I would have to tear down and rebuild most regularly, for downstream objects to function. Perhaps there is a caching of auth passed between them or something, but sure enough, sometimes for my ExternalSecret to fail its initial pull, I’d have to rebuild my SecretStore.
Next up, my
ExternalSecret.yaml. The error reporting on this object is fairly maddening. It provides a very generic error of “failed to pull secret” or something similar. Very little to go on, so off I go to the ESO pod to check logs. TLS errors. Ah. So started debugging. Looked at what certs were on there. No shell. Seemed to me that the pods weren’t getting assigned the certs I would have assumed naturally passed through the layers of the cluster. I amended the ESO deployment to give the pod the certs through by way of a ConfigMap. Secrets start pulling through! So I know thats the problem, but I have to figure out why the certs aren’t propagating through from local to pod. I eventually decide to first include a mount in the kind config to ensure the certs get mounted from local [^1] and then create a patch file to be applied after the ESO helm chart gets instantiated to apply additional fields mounting the certs from node to container. I applied the changes, cycled the pods (cycled the SecretStore too of course) and I saw the secret being created by the ExternalSecret. Messy but problem solved.The question does remain, why isn’t the pod receiving the appropriate certs? I would have imagined cluster-wide certs would have naturally flowed through from the node to the containers that need them. Interestingly, just after finishing the proof of concept, I did notice a bug/feature request on the ESO GitHub noting TLS issues with SecretStores, so this could well have been a bug. I’m yet to pull down v1.1.0 to see if this resolved naturally.
I rolled all of the above into a Taskfile to automate the process for colleagues. There’s far too much manual tinkering in there for someone unfamiliar with a few tools. They just have to run the task file and start applying their applications’ secret manifests for their local cluster to pull vault secrets. Neato.
Post Thoughts
“Its not really local Kubernetes though is it, its all running on Docker”
🤫
[^1].
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kind
nodes:
- role: control-plane
extraMounts:
- hostPath: C:\Certs\
containerPath: /usr/local/share/ca-certificates/