Near-Zero Downtime with Pulumi, Azure, and Cloudflare

Introduction
High availability is more than a buzzword,it is a core expectation when delivering modern applications. Users around the world expect your service to remain accessible regardless of maintenance, load spikes, or even regional cloud outages. This tutorial walks through building a globally resilient infrastructure using two Azure Kubernetes clusters, Pulumi for infrastructure as code, Cloudflare Tunnels for secure connectivity, and a geo-steered Cloudflare Load Balancer for intelligent traffic distribution.
We will explore why using multiple clusters is an effective strategy, how to deploy them with Pulumi, how Cloudflare Tunnels simplify security and connectivity, and how Cloudflare’s geo-steering ensures traffic routes to the nearest healthy cluster. By the end, you will understand how this architecture minimizes downtime,even during a regional disaster.
Why Build Two Clusters?
Running a single Kubernetes cluster may be acceptable for hobby projects, but in production you must account for failure scenarios. Cloud providers occasionally experience outages, whether due to power loss, network issues, or other disruptions. Hosting in just one region leaves you vulnerable to a single point of failure. By deploying two clusters in separate Azure regions, you gain geographic redundancy. If one region fails, the other continues serving traffic.
In addition, multiple clusters provide flexibility for blue‑green deployments and can help distribute user traffic based on location. Clusters close to your users deliver lower latency, improving performance and user experience. When coupled with an external load balancer capable of intelligent routing, you achieve both speed and resiliency.
Pulumi as the Infrastructure Engine
Pulumi is an open-source infrastructure as code (IaC) tool that allows you to define cloud resources in familiar programming languages. Rather than writing YAML or using multiple tools for provisioning, Pulumi uses code,in this article we’ll use C#,to create and manage Azure resources. This approach promotes reuse, logic, and strong type checking. It also integrates easily with CI/CD pipelines and version control for auditing changes.
Setting Up Pulumi
Before diving into cluster creation, ensure you have Pulumi installed locally. Follow the instructions from the Pulumi documentation, for example using the official install script:
curl -fsSL https://get.pulumi.com | sh
Next, log in to your Pulumi account or sign up for a free account. Pulumi stores state remotely, enabling collaboration across your team. If you prefer to store state in Azure storage, you can configure that as well.
Initialize a new Pulumi project:
mkdir azure-k8s-ha
cd azure-k8s-ha
pulumi new azure-csharp
Choose your project name and stack (for instance dev
or prod
). Pulumi will create a base C# project ready for Azure deployments.
Deploying Two Azure Kubernetes Clusters
Within your project, add the Pulumi.AzureNative
package for full access to Azure services:
dotnet add package Pulumi.AzureNative
Now create a file, Program.cs
, defining the clusters. You will specify two resource groups,one for each region,and create a managed Kubernetes cluster in each:
using Pulumi;
using Pulumi.AzureNative.Resources;
using Pulumi.AzureNative.ContainerService;
class MyStack : Stack
{
public MyStack()
{
var resourceGroup1 = new ResourceGroup("rg-eu-west", new ResourceGroupArgs
{
Location = "westeurope",
});
var cluster1 = new ManagedCluster("cluster-eu", new ManagedClusterArgs
{
ResourceGroupName = resourceGroup1.Name,
Location = resourceGroup1.Location,
AgentPoolProfiles =
{
new ManagedClusterAgentPoolProfileArgs
{
Name = "agentpool",
Count = 3,
VmSize = "Standard_DS2_v2",
Mode = "System",
}
},
DnsPrefix = "cluster-eu",
});
var resourceGroup2 = new ResourceGroup("rg-us-east", new ResourceGroupArgs
{
Location = "eastus",
});
var cluster2 = new ManagedCluster("cluster-us", new ManagedClusterArgs
{
ResourceGroupName = resourceGroup2.Name,
Location = resourceGroup2.Location,
AgentPoolProfiles =
{
new ManagedClusterAgentPoolProfileArgs
{
Name = "agentpool",
Count = 3,
VmSize = "Standard_DS2_v2",
Mode = "System",
}
},
DnsPrefix = "cluster-us",
});
}
}
await Deployment.RunAsync<MyStack>();
Pulumi treats these definitions as code that can be run repeatedly. Applying pulumi up
provisions the clusters. If you later change the configuration, Pulumi computes the diff and updates the infrastructure accordingly.
Automating Deployments
For consistent deployments across environments, create multiple stacks in Pulumi,one for staging and one for production, for instance. Within your CI/CD pipeline, run pulumi up
using service principal credentials. This ensures all clusters remain in sync and that every change is recorded.
Cloudflare Tunnels: Secure Exposure Without Public Ingress
Why Tunnels?
Managing inbound traffic for Kubernetes typically requires load balancers or Ingress controllers. Those components expose your services publicly, which can introduce attack surfaces and complicated firewall rules. Cloudflare Tunnels flip the model: a lightweight daemon called cloudflared
opens an outbound connection from your cluster to Cloudflare. Traffic from the internet hits Cloudflare’s network first, then traverses the tunnel to your service.
This has three major benefits:
- Security – No open ports are required on your cluster. The tunnel connection is established from the inside out, drastically reducing your exposure to port scans and DDoS attacks.
- Simplicity – You avoid managing a dedicated load balancer. The Cloudflare edge handles TLS termination, caching, and WAF policies.
- Portability – Because the cluster makes the outbound connection, it works in environments without public IP addresses or behind strict firewalls.
Deploying Cloudflare Tunnels with Pulumi
Rather than creating tunnels manually, you can use Pulumi’s Cloudflare provider to define them as code and pass the credentials directly to Kubernetes. Below is a simplified example that creates a tunnel and deploys the cloudflared
agent using Pulumi’s Kubernetes provider:
using Pulumi;
using Pulumi.Cloudflare;
using Pulumi.Kubernetes.Core.V1;
using Pulumi.Kubernetes.Apps.V1;
var tunnel = new Tunnel("aks-tunnel", new TunnelArgs
{
AccountId = myAccountId,
Name = "aks-primary",
});
var secret = new Secret("cloudflare-creds", new SecretArgs
{
Metadata = new Pulumi.Kubernetes.Types.Inputs.Meta.V1.ObjectMetaArgs
{
Name = "cloudflare-creds"
},
StringData =
{
{ "credentials.json", tunnel.Credentials! }
}
});
var deploy = new Deployment("cloudflared", new DeploymentArgs
{
Spec = new Pulumi.Kubernetes.Types.Inputs.Apps.V1.DeploymentSpecArgs
{
Replicas = 2,
Selector = new Pulumi.Kubernetes.Types.Inputs.Meta.V1.LabelSelectorArgs
{
MatchLabels = { { "app", "cloudflared" } }
},
Template = new Pulumi.Kubernetes.Types.Inputs.Core.V1.PodTemplateSpecArgs
{
Metadata = new Pulumi.Kubernetes.Types.Inputs.Meta.V1.ObjectMetaArgs
{
Labels = { { "app", "cloudflared" } }
},
Spec = new Pulumi.Kubernetes.Types.Inputs.Core.V1.PodSpecArgs
{
Containers =
{
new Pulumi.Kubernetes.Types.Inputs.Core.V1.ContainerArgs
{
Name = "cloudflared",
Image = "cloudflare/cloudflared:latest",
Args = { "tunnel", "--no-autoupdate", "run" },
Env =
{
new Pulumi.Kubernetes.Types.Inputs.Core.V1.EnvVarArgs
{
Name = "TUNNEL_TOKEN",
ValueFrom = new Pulumi.Kubernetes.Types.Inputs.Core.V1.EnvVarSourceArgs
{
SecretKeyRef = new Pulumi.Kubernetes.Types.Inputs.Core.V1.SecretKeySelectorArgs
{
Name = secret.Metadata.Apply(m => m.Name),
Key = "credentials.json"
}
}
}
}
}
}
}
}
}
});
This code provisions the tunnel, stores the credentials in a Kubernetes secret, and deploys two cloudflared
pods that automatically connect to Cloudflare. Repeat the same Pulumi module in both clusters with their respective credentials. Point DNS records such as eu.alex.rocks
and us.alex.rocks
to the tunnels so the load balancer has distinct origins for each region.
Creating a Cloudflare Geo-Steered Load Balancer
Overview
With your clusters connected to Cloudflare through tunnels, the next step is to front them with a Cloudflare Load Balancer. Cloudflare’s load balancing service can route traffic based on geography, health checks, and weight. We will create two origins,one for each region,and define a geographic steering policy. This ensures users connect to the closest healthy cluster.
Provisioning the Load Balancer
In the Cloudflare dashboard, navigate to Traffic > Load Balancing. Create a new Load Balancer and specify your domain name, e.g., app.alex.rocks
. Under the origins section, define two pools:
- EU Pool – Contains the hostname
eu.alex.rocks
pointing to your EU tunnel. - US Pool – Contains the hostname
us.alex.rocks
pointing to your US tunnel.
Configure each pool with a health check that periodically requests an endpoint on your cluster. Cloudflare will mark the origin unhealthy if the check fails.
Next, under the steering policy, choose Geo Steering. Map European countries to the EU pool and North American countries to the US pool. Optionally, add a fallback order so if one pool becomes unhealthy, traffic automatically fails over to the other.
Automating Load Balancer Configuration
Rather than configuring through the dashboard, you can define the load balancer with the Cloudflare API or using Pulumi. This approach keeps configuration as code. For example, using Pulumi’s Cloudflare provider in C#:
using Pulumi.Cloudflare;
var lb = new LoadBalancer("global-lb", new LoadBalancerArgs
{
ZoneId = myZoneId,
Name = "app.alex.rocks",
DefaultPoolIds = new[] { euPool.Id, usPool.Id },
FallbackPoolId = euPool.Id,
SteeringPolicy = "geo",
PopPools =
{
{ "EU", new[] { euPool.Id } },
{ "NA", new[] { usPool.Id } },
}
});
Here euPool
and usPool
are Cloudflare origin pools referencing the DNS records for each cluster’s tunnel. The geo map ensures European POPs connect to the EU pool, while North American POPs use the US pool.
How Geo Steering Delivers Near-Zero Downtime
In a stable environment, users connect to the cluster nearest to them, minimizing latency. If the EU region suffers a failure,say a data center outage or network problem,Cloudflare’s health checks mark the EU pool as unhealthy. The load balancer then routes all traffic to the US pool until the EU cluster recovers. Because Cloudflare sits in front of both clusters, the DNS name app.alex.rocks
remains constant. Users do not need to change anything. Failover occurs within seconds, dramatically reducing downtime.
Additionally, Cloudflare’s massive global network acts as a shield against DDoS attacks. By removing public Kubernetes ingress and routing through tunnels, you avoid direct exposure of your clusters. Even if one tunnel fails, the other cluster can continue to serve traffic, giving you time to troubleshoot while maintaining availability.
Pulumi Project Structure
For clarity, consider organizing your Pulumi project as follows:
azure-k8s-ha/
Pulumi.yaml
Pulumi.dev.yaml
Pulumi.prod.yaml
Program.cs
CloudflareResources.cs
The Program.cs
file handles Azure resources, while CloudflareResources.cs
creates Cloudflare pools and the load balancer. Pulumi stacks allow you to maintain separate settings (like cluster sizes or location) for different environments. Use environment variables or secrets to store API tokens securely.
Example snippet in CloudflareResources.cs
:
using Pulumi.Cloudflare;
var euPool = new LoadBalancerPool("eu-pool", new LoadBalancerPoolArgs
{
CheckRegions = { "WNAM" },
Name = "eu-pool",
Origins =
{
new LoadBalancerPoolOriginArgs
{
Name = "eu-origin",
Address = "eu.alex.rocks",
}
},
Description = "Pool for EU cluster",
});
var usPool = new LoadBalancerPool("us-pool", new LoadBalancerPoolArgs
{
CheckRegions = { "ENAM" },
Name = "us-pool",
Origins =
{
new LoadBalancerPoolOriginArgs
{
Name = "us-origin",
Address = "us.alex.rocks",
}
},
Description = "Pool for US cluster",
});
Then create the load balancer referencing these pools as shown earlier. Because everything is code, changes to your Cloudflare configuration follow the same review process as your Azure infrastructure.
Continuous Deployment Workflow
A typical workflow might involve GitHub Actions or another CI tool. When changes are pushed to the repository, the pipeline installs dependencies, runs pulumi up
, and applies modifications to both Azure and Cloudflare. By storing your Pulumi state in a remote backend, you ensure pipeline runs are safe and consistent.
Additionally, container images for your application can be built and published via the same pipeline. Once the images are available, a second step updates your Kubernetes deployments in each cluster. Using tools like Helm or Kustomize in combination with Pulumi ensures deployments remain repeatable.
Monitoring the System
With multiple moving parts, robust monitoring is essential. Azure provides metrics for cluster health and node status, which you can ingest into Azure Monitor or Prometheus. Cloudflare exposes logs for load balancer health checks, tunnel connections, and latency. Combining these metrics gives you real-time insight into both the infrastructure and network layer.
Set up alerts so that if one cluster or tunnel becomes unhealthy, engineers are notified immediately. Because the Cloudflare Load Balancer handles failover automatically, your service stays up even while you investigate. This is the true power of a geo-steered architecture: problems become events to handle calmly rather than urgent crises threatening downtime.
Scaling to More Regions
While this tutorial focuses on two clusters, the pattern easily extends to more regions. Simply add another Pulumi resource group and cluster, create an additional Cloudflare tunnel and DNS record, and include a corresponding pool in the load balancer. Cloudflare’s geo-steering supports custom rules for each continent or even down to specific countries, enabling fine-grained control over traffic flow.
As you scale, consider running stateful workloads in active-active configurations or using managed database services that support replication across regions. Pulumi can provision these as well, giving you a single source of truth for the entire infrastructure.
Cost Considerations
Operating multiple clusters incurs additional cost. Managed Kubernetes nodes, outbound data transfer, and Cloudflare load balancer fees all add up. However, consider the cost of downtime. If your service generates revenue or supports critical operations, even a brief outage can far exceed the monthly expense of redundant infrastructure. Pulumi allows you to track resource changes, so you can easily adjust cluster sizes or node pools based on demand to manage costs effectively.
Best Practices for Disaster Resilience
- Regularly Test Failover – Schedule controlled failover tests to ensure your tunnels and load balancer operate as expected. Temporarily disable one cluster and verify that traffic shifts seamlessly.
- Use Infrastructure as Code for Everything – Keep Pulumi or Terraform definitions for both Azure and Cloudflare. Version control enables quick rollbacks if a change causes issues.
- Secure Sensitive Data – Use Azure Key Vault or Pulumi secrets to store tokens and credentials. Avoid committing sensitive information to source control.
- Automate Credential Rotation – Cloudflare tunnel credentials and service principal passwords should be rotated periodically. Automate this process to reduce operational overhead.
- Monitor Egress Costs – Cloudflare tunnels rely on outbound traffic from your clusters. Monitor the egress charges in Azure and tune usage or caching strategies if costs rise unexpectedly.
Detailed Pulumi Walkthrough
While the earlier code snippet shows the basic cluster creation, a full production deployment involves additional components. Start by configuring a service principal in Azure with permissions to create resource groups and manage Kubernetes clusters. Store the credentials as Pulumi configuration values:
pulumi config set azure:clientId <appId>
pulumi config set azure:clientSecret <password> --secret
pulumi config set azure:tenantId <tenantId>
pulumi config set azure:subscriptionId <subscriptionId>
With these values in place, Pulumi authenticates automatically when you run pulumi up
. You can further parameterize the cluster size and location using stack configuration so staging and production environments remain consistent yet isolated. Pulumi’s programming model lets you loop over regions or create helper functions to avoid repetition. For example, you might write a function that accepts the region name and returns a cluster resource along with its kubeconfig. That kubeconfig can be exported, enabling your deployment pipeline to run kubectl
commands directly.
A thoughtful project structure also pays dividends. Keep your resource definitions in well-named modules, such as Network.cs
for virtual networks and Aks.cs
for the clusters themselves. As the architecture grows to include databases, key vaults, or private container registries, you can add more modules without cluttering the main entry point. Pulumi tracks the dependencies so resources are created in the correct order.
Building the Application Layer
After the clusters come online, you will need a repeatable way to deploy your application to each region. GitOps workflows are ideal here. Tools like Flux or Argo CD watch a Git repository for Kubernetes manifests or Helm charts and apply them automatically. Include environment-specific values in separate overlays so that a single chart can deploy to multiple regions with minor variations. Combining GitOps with Pulumi means your infrastructure and application code share the same repository and review process, simplifying audits and rollbacks.
Automated container builds ensure each commit results in new images. Once the images are pushed to a registry, your GitOps controller updates the deployments in both clusters. This keeps the application version in sync across regions, reducing the risk of unexpected behavior during a failover event.
Cloudflare Tunnel Configuration Options
The default tunnel deployment uses a single token, but Cloudflare offers advanced configuration options. You can define multiple ingress rules in the tunnel’s config file, directing traffic from different hostnames or paths to distinct services inside the cluster. For teams requiring user authentication, integrate Cloudflare Access directly in the tunnel configuration. Access policies enforce SSO logins before requests reach Kubernetes, adding another layer of security.
Monitoring the tunnel itself is important. Enable Prometheus metrics in the cloudflared
process to track connection status and request counts. Kubernetes horizontal pod autoscaling can then scale the tunnel pods if traffic increases suddenly. Because the tunnel endpoints sit within your private network, you can even restrict the pods to nodes with specific security policies or private subnets.
Advanced Load Balancer Techniques
Geo steering is only one aspect of Cloudflare’s load balancing capabilities. You can also apply weighted routing to shift a percentage of traffic to a new cluster during migration, or to run A/B tests across regions. Cloudflare’s load balancer supports session affinity, ensuring users remain pinned to the same origin during a session,useful if your application relies on in-memory state or local caches.
Another feature worth exploring is load balancer rulesets. These let you evaluate incoming requests and modify the behavior dynamically. For instance, you can route traffic from beta testers in a particular city to a staging cluster while all other users hit production. Combined with Pulumi, you can codify these rules and adjust them as the application evolves.
Case Study: Handling a Regional Outage
Imagine your primary region experiences a networking failure that disconnects the tunnel pods. Cloudflare’s health checks mark the affected pool as down within seconds. Because your secondary region is fully synced,both in infrastructure and application version,the load balancer automatically shifts traffic. Users may notice a brief increase in latency as their connections are rerouted, but the service continues running. Meanwhile, you investigate the outage without the stress of a major downtime event. When the primary region recovers, Cloudflare gradually shifts users back according to your steering policy, ensuring a smooth transition.
Operational Tips and Troubleshooting
During day-to-day operations, keep an eye on Kubernetes node health and cluster upgrades. Use Azure’s automated node image upgrades to reduce management overhead. For the tunnels, set up pod disruption budgets so that at least one tunnel instance remains available during rolling updates. Logs from cloudflared
provide detailed error messages if authentication fails or if connectivity to Cloudflare’s edge is disrupted. For Pulumi, enable stack tags that capture commit SHAs or ticket numbers, making it easy to trace infrastructure changes to source code.
If you encounter unexpected behavior with the load balancer, verify that DNS records point to the correct Cloudflare hostname and that each origin’s health check is reachable through the tunnel. Cloudflare’s analytics dashboard can reveal whether requests are failing at the edge or being forwarded successfully to your cluster. Enable log push to a storage account for long-term retention and auditing.
Testing the Complete Setup
Before trusting the system in production, conduct thorough testing. Start by simulating pod failures to ensure the tunnels reconnect automatically. Next, disconnect one of the clusters entirely,perhaps by shutting down its nodes or blocking outbound traffic. Cloudflare should quickly detect the failure and reroute users. Measure the time it takes for failover to occur and check application logs for any errors during the transition. Running these tests on a schedule builds confidence that your disaster recovery plan will hold up when needed.
You can also script synthetic transactions that execute through the load balancer from various geographic locations. Services like k6 or even cron jobs running in other cloud providers can repeatedly call your application’s endpoints and verify the response. Store these results in a dashboard to track latency trends over time.
Future Enhancements
Once the basics are working, think about additional improvements. Cloudflare’s Argo Smart Routing can further reduce latency by dynamically choosing the fastest path through their network. You might also explore using Pulumi Crosswalk for Azure to simplify network configuration or integrate Azure Front Door as another layer of global distribution. As the infrastructure grows, implementing policy-as-code with tools like Open Policy Agent ensures that every new cluster adheres to security and compliance standards automatically.
By iterating on these enhancements, you continue to improve resilience and performance while keeping operational complexity manageable. Each piece builds on the foundation established by the dual clusters, secure tunnels, and geo-steered load balancer.
Conclusion: Resilience Through Redundancy
By combining Pulumi, Azure Kubernetes Service, Cloudflare Tunnels, and Cloudflare Load Balancing, you create a robust platform that withstands regional outages with minimal disruption. Users connect through a single domain while you maintain the flexibility of multi-region deployments. Pulumi keeps infrastructure reproducible, Cloudflare Tunnels hide your clusters from the public internet, and geo-steered load balancing ensures traffic reaches the closest healthy endpoint.
This architecture is not only powerful for disaster recovery scenarios but also enhances everyday performance and security. Whether a user connects from Frankfurt or Chicago, they enjoy lower latency and the protection of Cloudflare’s global network. And if the worst happens,an outage takes out an entire region,traffic automatically reroutes to the remaining cluster, keeping your service online. The combination of automation, redundancy, and intelligent routing provides near-zero downtime, fulfilling the high availability expectations of modern cloud-native applications.
With the concepts and code in this article, you are ready to implement a multi-region strategy that scales with your business needs. Pulumi’s code-driven approach ensures repeatability, Cloudflare’s tunnel technology simplifies networking, and the geo-steered load balancer ties everything together. Embrace these tools to deliver an application that remains resilient even in the face of regional disasters.
Final Thoughts
Building global applications requires careful planning. The combination of Pulumi, Azure, and Cloudflare provides a flexible toolkit for the job. By automating cluster provisioning, securing access through tunnels, and distributing traffic intelligently, you gain the agility to grow without compromising availability. Experiment with these tools in a lab environment and iterate on the design to suit your organization’s needs. The techniques discussed here can serve as a blueprint for any team aiming to deliver a resilient user experience worldwide.