Infrastructure as Code: Configuration Drift Is Killing Your Network
IaC Series – Issue 4 of 6: When Your Network Becomes Software
🧨 Everything Was Fine… Until It Wasn’t
Everything was working yesterday.
No alerts.
No complaints.
No tickets titled “URGENT: THE INTERNET IS BROKEN AGAIN.”
Then today?
One site can’t reach another
A firewall rule “definitely exists” (but somehow doesn’t work)
That VLAN you swear was there… is gone
And nobody touched anything… apparently
Yeah. Sure.
Welcome to configuration drift — the silent killer of stable networks.
🧠 What Configuration Drift Actually Is
Configuration drift is what happens when your intended configuration and your actual configuration slowly drift apart over time.
Not because of one big change…
…but because of hundreds of tiny ones:
“Quick fix” CLI changes
Emergency firewall rules
Someone tweaking a port at 2AM
A “temporary” change that became permanent
That one engineer who “just logs into the box real quick”
Multiply that across your environment and you get:
“Mostly consistent… except for all the parts that aren’t.”
🧟 The Real Problem: Drift Feels Invisible
Here’s why drift is dangerous:
It doesn’t break things immediately.
It builds slowly. Quietly.
Until one day something depends on a configuration that used to be true…
…and now it isn’t.
That’s when you get:
“It works in one site but not another”
“It worked last week”
“QA works, prod doesn’t”
“Same config… I think?”
Spoiler: it’s not the same config.
🔥 Real-World Drift Examples
Let’s be honest… you’ve seen at least one of these recently:
Firewall drift
That rule exists… just not on the device you need.
Switch drift
VLAN on the core? Yes.
On the access switch? Not even close.
Cloud drift
Terraform says one thing.
The cloud console says another.
Guess which one is actually running?
“Temporary fix” drift
“We’ll remove that later.”
We did not remove that later.
💀 Why Most Networks Are Drift Factories
Most environments are set up in a way that guarantees drift:
No true source of truth
Direct device changes
Multiple engineers doing their own thing
No enforced process
No validation after changes
It’s basically:
“Everyone just try your best and don’t break anything.”
🧩 How IaC Fixes Drift (If You Actually Use It)
Infrastructure as Code doesn’t magically fix drift…
…but it gives you control over it:
One source of truth (Git)
Repeatable deployments
Change visibility
The ability to compare intended vs actual
But none of that matters if people ignore it.
⚠️ The Part Nobody Likes Hearing
You don’t have a tooling problem.
You have a behavior problem.
You can have:
Ansible
Terraform
Pipelines
Clean repos
…and still have drift everywhere if people are:
Logging into devices
Making “just one quick change”
Skipping the process
IaC only works when:
Code becomes the ONLY way changes happen.
Not optional. Not “preferred.”
Required.
🧠 The Mindset Shift
Stop thinking:
“The device is the source of truth.”
Start thinking:
“The device is just a deployed artifact.”
If it drifts?
You don’t fix it manually.
You reapply the correct config from code.
🛠️ Quick Win You Can Do This Week
Pick ONE thing you change often:
Firewall rules
VLANs
Interface configs
Then:
Export it
Put it in Git
Treat it as the source of truth
Make changes there first
Even if you still apply changes manually…
you’ve already reduced drift.
🔒 Paid Section: How to Actually Kill Drift
Up to this point, you’ve probably realized something uncomfortable:
Your network isn’t broken… it’s just slowly drifting out of control.
And the worst part?
Most teams don’t notice until it turns into:
a production outage
a security gap
or a “this makes no sense” troubleshooting nightmare
💡 Here’s What We’re About to Fix
In the rest of this issue, I’m going to show you exactly how to start controlling drift using the tools you already have.
No enterprise platform required. No massive rebuild.
Just practical, real-world steps.
🔓 What You’ll Learn in the Full Issue
How to detect configuration drift automatically using Ansible + APIs
A simple drift audit workflow you can run this week
How to enforce “no manual changes” without slowing your team down
A clean, realistic Git + pipeline structure for network configs
A real-world drift outage scenario (and how it should have been prevented)
If you’ve ever said:
“That shouldn’t have changed…”
This is where you fix that problem for good.
(Unlock the rest below 👇)
Keep reading with a 7-day free trial
Subscribe to The Config Report to keep reading this post and get 7 days of free access to the full post archives.


