Rethinking VPNs for Web3 Infrastructure: Lessons from Migrating to Zero Trust

Posted by marvinxtech@reddit | sysadmin | View on Reddit | 6 comments

I’ve spent the past year working on migrating a Web3 exchange’s internal access layer away from traditional VPNs toward a Zero Trust / SDP model.

This wasn’t a “rip and replace for security buzzwords” project — it was driven by very practical issues that started to hurt at scale.

What broke at scale

1. Infrastructure sprawl
We were operating across AWS, GCP, and some bare metal — multiple regions, hundreds of nodes.
Maintaining VPN routing and access rules across that surface became increasingly fragile.

2. Lateral movement risk
Once an engineer connected to the VPN, the network was relatively flat.
In theory, a compromised laptop could pivot toward sensitive services (e.g. wallet signing infra).

3. Latency overhead
During high-volatility periods, we consistently saw \~100ms+ added latency due to VPN routing.
For SRE workflows, that’s not trivial.

What we moved to (high-level)

We ended up implementing a Software-Defined Perimeter model with a few core components:

• Single Packet Authorization (SPA)
Management endpoints are not exposed at all unless a valid cryptographic packet is received.
Effectively removed internet-facing attack surface for SSH / K8s API.

• Identity-aware access (OIDC-based)
We stopped distributing long-lived kubeconfigs.
Access is now tied to identity — revoke the user, access disappears immediately across clusters.

• Edge-level micro-segmentation
Access is scoped tightly per role.
Being “on the network” no longer implies reachability — most engineers can’t even see infra outside their domain.

Results we actually measured

Lessons learned (the non-obvious parts)

MFA fatigue is real
If you require MFA on every action, people will work around it.
We reduced friction using device posture checks (disk encryption, endpoint security) and only step-up MFA when risk changes.

Legacy tooling doesn’t cooperate
Some internal tools simply don’t support modern auth flows.
We had to introduce local agents / tunnels as a compatibility layer.

Zero Trust ≠ zero complexity
You’re trading network simplicity (VPN) for identity + policy complexity.
Operational maturity matters a lot here.

Open question to others here

For teams running multi-cloud or high-risk infra:

Happy to share more implementation details if useful.