Infrastructure as Code: Why Most IaC Projects Fail (Even With the Right Tools)
IaC Series – Issue 6 of 6: When Your Network Becomes Software
Every failed Infrastructure as Code project starts the same way:
“We bought the tools.”
Then six months later:
configs still live in spreadsheets
nobody trusts Git
production changes happen directly on the firewall anyway
and the “automation server” is now just a Linux VM running three broken Python scripts and a cron job named:
final_final_v2_REAL.py
Because here’s the uncomfortable truth nobody puts in vendor webinars:
Terraform didn’t fail your project.
Ansible didn’t fail your project.
Git didn’t fail your project.
Your processes failed your project.
Over the last five issues, we talked about:
declarative vs imperative automation
state files
configuration drift
immutable infrastructure
replacing systems instead of endlessly repairing them
But this final issue is the most important one in the entire series because it explains why so many Infrastructure as Code projects quietly die in conference rooms while everyone pretends “automation is the future.”
The tools are rarely the real problem.
The environment is.
Infrastructure as Code Doesn’t Fix Chaos
One of the biggest misconceptions in networking is this:
“If we automate it, things will get better.”
No.
If your environment is chaotic, automation just helps you break things faster and more consistently.
Infrastructure as Code is basically a magnifying glass for technical debt.
If your environment has:
inconsistent naming
undocumented VLANs
random firewall rule logic
duplicate address objects
mystery static routes
“temporary” NAT rules from 2019
switches configured differently for absolutely no reason
…then automation doesn’t magically clean that up.
It weaponizes it.
Because now your bad decisions deploy instantly at scale.
The Most Dangerous Phrase in IT
“We’ll clean it up later.”
No you won’t.
You’ll automate around it.
Then document around it.
Then build workarounds for the workaround.
Then eventually nobody remembers why VLAN 317 exists but everyone is too afraid to remove it because it might somehow control HVAC for an office in Nebraska.
Infrastructure as Code works best in environments that value:
standards
consistency
documentation
repeatability
Not:
vibes
tribal knowledge
and “Don’t touch that switch, Gary configured it.”
The Source of Truth Problem
Most IaC projects fail because nobody agrees on where truth actually lives.
Is it:
the firewall?
Panorama?
Aruba Central?
Git?
NetBox?
an Excel spreadsheet?
Steve’s OneNote notebook?
that Visio diagram from 2021?
Because if engineers can still make direct production changes outside the code workflow…
…your Infrastructure as Code project is already drifting.
That’s the hard truth.
You cannot have:
Git-based infrastructure
ANDrandom production edits
at the same time.
Eventually one becomes fiction.
And unfortunately it’s usually the Git repo.
The Controller Trap
This is where network engineers get stuck all the time.
They think:
“We use templates in our controller, so we’re doing IaC.”
Not exactly.
Controllers help with:
consistency
templating
standardization
centralized management
And those are GOOD things.
But true Infrastructure as Code means:
the code repository is the source of truth
changes are version controlled
deployments are repeatable
rollback is possible
history is auditable
If the real source of truth is still:
clicking in a GUI
changing templates manually
or editing objects directly in production
…you’re still operating manually.
Just with nicer dashboards.
“Can You Automate This Real Quick?”
This sentence has destroyed more automation projects than bad YAML ever will.
Because many companies want:
Netflix-level automation
Google-level reliability
Amazon-scale deployment pipelines
…but the actual environment looks like:
no lab
no testing
no Git standards
no code reviews
no rollback process
no change validation
and one overworked engineer trying to build automation between ticket escalations.
Infrastructure as Code is not a side quest.
It is an operational model.
That means:
process changes
team buy-in
maintenance standards
testing standards
documentation discipline
and leadership support
Without those things, the tooling eventually becomes shelfware.
Or worse:
a collection of half-working scripts everyone is scared to run.
The “One Automation Person” Problem
Every company eventually creates:
“The Automation Guy.”
You know the one.
The engineer who:
writes all the scripts
understands the APIs
maintains the pipelines
fixes the broken jobs
knows where the tokens are stored
and becomes the only human capable of safely touching the automation platform.
Congratulations.
You accidentally created a new single point of failure.
Real Infrastructure as Code maturity happens when:
workflows are documented
repos are shared
standards are consistent
and automation survives employee turnover
Because if your automation platform dies the moment one engineer takes PTO…
…you didn’t build automation.
You built dependency.
Why Testing Gets Ignored
Testing sounds great until someone asks:
“Can we just push it directly to prod?”
And suddenly everybody becomes very confident.
This is where IaC projects start getting dangerous.
Because automation without testing is basically:
“What if outages happened faster?”
Real IaC environments need:
development environments
QA validation
CI/CD pipelines
syntax checks
peer review
deployment approval
rollback plans
Otherwise your deployment process becomes:
Push code
Pray aggressively
Open incident bridge
Nothing Humbles an Engineer Faster Than Automation
Manual mistakes usually affect:
one device
one site
one VLAN
one firewall rule
Automation mistakes affect:
ALL OF THEM
Immediately.
Simultaneously.
At machine speed.
Nothing builds character like accidentally disabling BGP on 42 sites because of one variable typo.
That’s why mature IaC teams become obsessed with:
validation
testing
approvals
guardrails
and staged deployments
Not because they’re slow.
Because they’ve suffered before.
The Real Mindset Shift
This is the part most people miss.
Infrastructure as Code is NOT:
Terraform
YAML
Python
GitHub
Ansible
pipelines
APIs
Those are just tools.
Infrastructure as Code is really about:
discipline
repeatability
version control
operational maturity
standardization
and removing human randomness from production.
That’s the actual transformation.
The engineers who succeed with IaC are usually not the “best programmers.”
They’re the engineers willing to:
standardize
document
test
collaborate
and stop treating production like a handwritten art project.
Where Most Teams SHOULD Start
Most teams try to jump straight into:
full Terraform deployments
self-healing infrastructure
dynamic provisioning
automated rollback systems
Meanwhile their switch naming convention still changes every building.
Start smaller.
Much smaller.
Start with:
firewall rules
VLAN definitions
templates
DNS records
object groups
ACLs
NAT policies
Then:
put them in Git
review changes in pull requests
validate before deployment
deploy consistently
That alone puts you ahead of most enterprise environments.
Seriously.
The Best IaC Advice Nobody Wants to Hear
You do NOT need:
Kubernetes
a giant DevOps team
expensive tooling
AI-generated YAML
or twelve cloud certifications
You need:
consistency
process
source control
documentation
and leadership willing to stop bypassing procedures “just this once.”
Because “temporary exceptions” are how configuration drift becomes company culture.
Final Thoughts
Infrastructure as Code is not about becoming a programmer.
It’s about building infrastructure that survives:
outages
audits
growth
acquisitions
team turnover
and your future self at 2:13 AM trying to remember why a production firewall rule says:
TEMP-DO-NOT-REMOVE-FINAL
The engineers who succeed with IaC are not the ones with the fanciest tools.
They’re the ones willing to change how infrastructure is operated.
That’s the real challenge.
And honestly?
That’s why most projects fail.
Paid Subscriber Section
What Mature IaC Teams Actually Do Differently
Everything above explains why projects fail.
This section is about what successful teams actually do in the real world.
Keep reading with a 7-day free trial
Subscribe to The Config Report to keep reading this post and get 7 days of free access to the full post archives.


