Meshing Around
April, 2026
Security Tune-Up
With only a few final steps required before friendly, beta users will be allowed to start running Kubernetes applications with Dorsal, I decided that I wanted to do a security review in April. The goal here was to both get things into an acceptable state and ensure, to a reasonable extent, that I wasn’t making design decisions that were fundamentally incompatible with future security efforts.
The simple, single afternoon tasks were:
- Disable default Service Account Token mounting
- Disable public EKS API access
- Confirm that there are no non-default Service Accounts (and thus, no RBAC rules) provisioned for user applications
- Confirm MFA is enforced for my AWS Login & IAM Assume Role trust policies
While important, those were not particularly interesting.
Zero Trust Networking
I need to be honest here… I’d be lying if I acted like I’m an expert in this particular area. My goal was relatively straightforward:
Create a set of virtual networks within my EKS Cluster(s), defined as a collection of namespaces whose perimeter is determined by the Dorsal Project to which they belong.
In other words, I want services within the same Dorsal Project to be able to communicate directly with one-another, while blocking ingress from any other service in the Cluster.
At the same time, services need to be able to advertise themselves to the public internet when desired. Here’s what I wound up with…
The Way In
To begin, every Dorsal Project provisions its own internet-facing Application Load Balancer (ALB) in AWS.
Once routed through the ALB, the request hits the Istio Gateway, which is also distinct per Dorsal Project. This is the outer shell of the Project’s mesh network. Here, the Gateway takes the incoming request and upgrades the network traffic to use HBONE (HTTP-Based Overlay Network) with mTLS via a Project-specific certificate.
I also intend to make it possible for Projects to advertise specific routes at the Istio Gateway, allowing inter-project service communication without requiring a round-trip to the public internet. Currently, I’m using this internally for the Pulumi Kubernetes Operator webhooks, and one of my guiding principals for Dorsal’s design is to build every feature that I need to run the project into the platform itself.
Inside the Mesh
Once inside the mesh, I need to:
- Keep the traffic contained within the mesh.
- Keep other traffic out of the mesh.
The base layer of this effort is with a “default deny” Network Policy. This policy denies all ingress and all egress from every pod in the Namespace. Generally speaking, I’m intending for Network Policies to prevent pods from reaching outside of the mesh and hitting pods directly at their IP address.
Of course, a policy denying everything doesn’t make much sense, so the next step is to layer on an additional Network Policy that permits communication between pods within the Dorsal Project. If done correctly, this should ensure that only pods within a subset of Namespaces are allowed to communicate with one-another.
While controlling (layer 3) traffic is very important, an additional layer is required to verify that the caller is who they say they are and that they’re allowed to make the current request. That’s where Istio’s (layer 4) Peer Authentication provides an additional layer: verifying the identity through ztunnel-powered mTLS.
At the pod level, I’m also handling some security hardening through Security Contexts and policy engines, but that work is in-flight and being investigated, so I won’t discuss it now.
Helm
As mentioned last month, the big “final” implementation detail to complete before other people (and my personal test account) can properly use Dorsal for hosting Kubernetes applications is the data model rebuild.
I decided that the most logical way to do this would be to “work backwards” from the manifests to figure out what data needed to be known at generation time. One early decision immediately became an obvious mistake: putting abstractions into the Helm Chart(s).
So, this month, I began making the Helm Chart completely transparent. By this, I mean ripping out anything that wasn’t an exact one-to-one mapping of Helm Parameters to manifest values. Rather than looping over, say, something called an “application” in both Service and Deployment manifests, I’d denormalize the data and pass in explicit services[0].something and deployments[0].something.
This means that I have a gradual abstraction:
- Manifests: Raw Kubernetes resources, as God intended
- Helm Chart: Templates of those raw Kubernetes manifests, with no business logic
- Database: Normalized version of the Helm parameters, helping to verify data integrity
- API/UI: Abstractions to simplify and generate relationships between user-defined Dorsal resources
My hope is that the system will remain easier to reason about, refactor, and debug when the database values translate directly to the manifests. Unit testing should be straightforward, as I can take an API call (abstracted) and assert the resulting database values. I can then, separately, verify that the database models are properly making their way into the Helm Chart / manifests, without having to track an abstraction through the system.
Looking Forward
We’re heading into the Summer weather, and I’m already feeling real life building back up, pulling my attention away. Despite that, I think I can continue to chip away at this data model rebuild in May.
If the database schema is complete, even if I don’t complete any API/UI work, I think I could consider May to be a success.
Don’t get me wrong, I would much rather have this site live, with real, paying users as soon as possible… but I need to be realistic. I’m still employed full-time, with a wife, friends, and life events.
Wish me luck, and I hope to have some good progress to report in 30 days.
Nathan