Rethinking VPNs for Web3 Infrastructure: Lessons from Migrating to Zero Trust

Posted by marvinxtech@reddit | sysadmin | View on Reddit | 6 comments

I’ve spent the past year working on migrating a Web3 exchange’s internal access layer away from traditional VPNs toward a Zero Trust / SDP model.

This wasn’t a “rip and replace for security buzzwords” project — it was driven by very practical issues that started to hurt at scale.

What broke at scale

1. Infrastructure sprawl
We were operating across AWS, GCP, and some bare metal — multiple regions, hundreds of nodes.
Maintaining VPN routing and access rules across that surface became increasingly fragile.

2. Lateral movement risk
Once an engineer connected to the VPN, the network was relatively flat.
In theory, a compromised laptop could pivot toward sensitive services (e.g. wallet signing infra).

3. Latency overhead
During high-volatility periods, we consistently saw \~100ms+ added latency due to VPN routing.
For SRE workflows, that’s not trivial.

What we moved to (high-level)

We ended up implementing a Software-Defined Perimeter model with a few core components:

• Single Packet Authorization (SPA)
Management endpoints are not exposed at all unless a valid cryptographic packet is received.
Effectively removed internet-facing attack surface for SSH / K8s API.

• Identity-aware access (OIDC-based)
We stopped distributing long-lived kubeconfigs.
Access is now tied to identity — revoke the user, access disappears immediately across clusters.

• Edge-level micro-segmentation
Access is scoped tightly per role.
Being “on the network” no longer implies reachability — most engineers can’t even see infra outside their domain.

Results we actually measured

No public-facing management ports (SSH / RDP / K8s API)
\~30% reduction in access latency vs previous OpenVPN setup (mainly due to edge PoPs)
Full session-level auditability (user identity instead of shared credentials)

Lessons learned (the non-obvious parts)

MFA fatigue is real
If you require MFA on every action, people will work around it.
We reduced friction using device posture checks (disk encryption, endpoint security) and only step-up MFA when risk changes.

Legacy tooling doesn’t cooperate
Some internal tools simply don’t support modern auth flows.
We had to introduce local agents / tunnels as a compatibility layer.

Zero Trust ≠ zero complexity
You’re trading network simplicity (VPN) for identity + policy complexity.
Operational maturity matters a lot here.

Open question to others here

For teams running multi-cloud or high-risk infra:

Are you still on VPNs, or have you moved to ZTNA/SDP?
How are you handling identity + access for K8s at scale?
Any good patterns for dealing with legacy tooling in a Zero Trust model?

Happy to share more implementation details if useful.

[-]

Ambitious_Doctor_957@reddit

The lateral movement point is the one that does not get enough attention in these migration discussions. Flat network access being the default in VPN setups means the blast radius of a single compromised credential is the entire internal surface. The SPA approach for removing management ports from the internet facing layer entirely is the right call for any infrastructure handling wallet signing or custody adjacent workloads.

The legacy tooling problem you flagged is genuinely the hardest part of Zero Trust migrations and the local agent compatibility layer is the honest solution most write-ups skip over. Most real environments have at least a few services that simply cannot speak OIDC natively and the choice is either a compatibility shim or a multi year refactor. The shim is the pragmatic answer.

On your open question about K8s identity at scale the pattern that holds up well is short lived certificates issued per session rather than distributed kubeconfigs with long TTLs. Tools like SPIFFE and SPIRE handle workload identity at the Kubernetes layer and integrate reasonably well with OIDC at the human identity layer so you get a consistent identity model across both planes.

The data layer underneath all of this is worth flagging as a separate concern. Zero Trust on the access path does not solve the question of where your data physically lives and who has custody of it at rest. For a Web3 exchange with multi-cloud sprawl the sovereignty question at the storage layer is as important as the access control question at the network layer. IOMETE (https://iomete.com) runs an Iceberg native lakehouse inside your own infrastructure specifically so the data governance boundary and the network security boundary are the same perimeter rather than two separate things to audit.

Solid write-up. The measured latency reduction is the kind of concrete outcome that actually moves internal conversations forward.

PhilipLGriffiths88@reddit

We use NetFoundry / open source OpenZiti. It uses PKI, but is interoperable with x590/OIDC - which crucially means while it works for humans, it also handles non-human and legacy workloads. I think this is the key part, heck, even Siemens use the tech in OT (and thats tons of legacy).

Jaki_Shell@reddit

Brraaap@reddit

What product are you selling?

marvinxtech@reddit (OP)

Not planning to sell any products

MeatPiston@reddit

Stop using an llm to create your posts.