Why Kubernetes needs an LTS

[-]

Main-Drag-4975@reddit

Pets, not cattle. Recycle your clusters regularly and this stops being an issue.

[-]

aamfk@reddit

I name ALL my users like this. That way when I end up having 10 different environments, I can take a backup from 9 and restore it into 2 without changing usernames and shit

every rule is MEANT to be broken

[-]

That sounds great on the surface but there’s a non negligible risk and complexity to what you lightly mention as “swap out individual clusters”. To truly achieve that kind of flexibility you’d eventually need something to manage your cluster of clusters, and then you’ll face the same problem because your K8s^2 needs an upgrade every other month. Using your analogy, at some point on your system stack, you need a pet.

[-]

Main-Drag-4975@reddit

The humans are the pets

[-]

SittingWave@reddit

The pets are your humans in the loop — ops teams.

executives consider them as cattle.

[-]

SilverTroop@reddit

But the reason why we have k8s is that manually shuffling containers around is tedious, repetitive and error prone. My feeling is that the same would apply to cluster management at the level you suggest.

[-]

Main-Drag-4975@reddit

Sounds like you’ve chosen the production Kubernetes cluster as your waterline in the sand below which you will not practice multiple redundancy or automated provisioning.

The large companies that build Kubernetes do not operate this way. Hopefully you can find a RedHat-style vendor to charge you for a Kubernetes distribution with longer paid support windows for their releases.

[-]

orthoxerox@reddit

Hopefully you can find a RedHat-style vendor to charge you for a Kubernetes distribution with longer paid support windows for their releases.

Why go for RedHat-style if you can just pay RedHat for that?

[-]

SilverTroop@reddit

We have (regularly tested) disaster recovery procedures where we create a cluster from scratch. We just don’t treat it as cattle in the sense that we refine and estimate our work with the assumption that no time will be spent on cluster management. I think this works well and strikes a good balance for our system, but I can imagine how it could become difficult to manage on a larger scale

[-]

WhoLetThatSinkIn@reddit

Why would it be difficult at a larger scale? Seems like it would be easier when setup with some best practice in mind.

How much human intervention is involved in "creating a cluster from scratch"? If it's more than defining the type of cluster - if you have multiple - and which level environment it should be provisioned in, then you're probably doing too much manual work.

We're in the process of migrating away from eksctl to pulumi so there are bits and pieces all over, but it's all pipelined after the initial variables are set. We have a hard rule that everything we codify must be idempotent, so that makes life much much easier.

Pass variables into an AzDO pipeline that:
executes a series of bash scripts and python files:
1. create a new branch in our K8s GitOps repo (branch is derived from the cluster's id)
2. reach out to grafana cloud api to create a new api token and generate the values.env.yaml which will be held as an artifact, and store it in AWS Secrets Manager, we'll pull this and other secrets via external-secrets. (cluster variable is a guid and optional user-readable name because it helps to identify a cluster in every day communication)
3. uses a base scaffold folder to generate a series of folders for the new cluster's initial manifests (our stuff updating is handled in another pipeline, third party helm charts like cert-manager, external-dns, kong-ingress, grafana cloud, etc. are treated as immutable)
  
  This uses most of the variables, because we control vpc, subnets, k8s version, cluster resources, tags that eksctl applies, etc. all from the scaffolded folder. We're working on migrating this to pulumi, but eksctl handles quite a bit, especially when it comes to the more complex EKS plugins. 1. At this point we stand up only the base EKS cluster (we typically use a 3-node managed nodegroup for this part) and namespaces in a bash script, because we need the oidc provider and some of the ARNs for future steps 1. Use an API that wraps pulumi automation to generate all necessary IAM roles, policies, service accounts, etc. by passing in the cluster guid and type as a config variables to the appropriate environment's stack on the project (think terraform workspace if you're unfamiliar with pulumi). We use this same API to deprovision these resources when a cluster is destroyed.
This one is cool because it's just adding or subtracting from an array of objects [{ cluster: <guid>, type: <type>, kubernetesVersion: <version> }, ...] and we built custom ComponentResources to iterate over each resource we need to create. We lookup the cluster info, oidc provider, consume the type. So, every cluster gets unique policies, roles, policy attachments, etc. and splash damage by cross-cluster contamination is basically impossible. It's the solution that all of the other steps are headed towards. 1. back to bash, we apply appropriate tags from vpcs, subnets, etc. for Karpenter and AWS Load Balancer Controller based on the variables and cluster name. We also add tags for our Resource Management Groups, cost reporting, etc. as well. 1. a separate pulumi API adds ingress and egress rules to security groups 1. another pulumi API adds required transit gateway route table associations, etc. if we've created a new VPC for the cluster for some reason (usually capacity) 1. Back to the cluster! we deploy AWS VPC-CNI, AWS EBS and EFS CSI Drivers and storage classes, Karpenter and provisioners, AWS LBC, metrics-server, KEDA 1. deploy Kong x2 (our ingress controller of choice, one internal, one external), kuma (service mesh of choice), external-secrets (fetches a lot of credentials that our apps use, api keys, etc.) and redis. Also install all of our custom kong and kuma plugins 1. deploy external dns, cert manager and issuers

At this point we've got a fully functioning 'base workload' cluster that can generate routes, certificates, etc., the service mesh is connected to a global control plane cluster that operates separately (and is its own type), and we're ready to rollout all our goodies. 1. Install ArgoCD, Workloads and rollouts, generate a random password and store it away in AWS 1. Add the cluster to the appropriate agent pools and deployment environments in AzDO. We also add the cluster's info to a variable group that gets consumed by our services during builds and deploys, so that they know where to build and deploy. We use AzDO as the primary operator for our CI/CD. Been using it since before GHA came out and haven't had a compelling reason to migrate. 1. Now we're ready to install our stuff. 1. We configure our ArgoCD projects and deploy two "app of apps": one that is all of the above (we call it backbone) and the other that is every single one of our workloads that will run on this cluster type (we call it apps. Yes, we're great with names). At this point these are all targeting the new branch we've created, as well as their sub-apps. 1. Sync'em boys!

Syncing Apps of Apps doesn't actually create workloads, only the ArgoCD "Apps". We can now scrape through the API and make sure that none of the apps have any errors before deploying actual workloads. For backbone we can check if there's any unknown drift and alert/pause for manual intervention. For apps it'll throw an error if any of the manifests that the argo app expects is missing. Very handy. 1. If everything is copacetic, we first sync the backbone and then apps All of our in-house services that have an externally available ingress start with a weight of 0 in route53 and health checks that will fail thanks to feature flags. We use route53 to distribute across clusters right now but are working on using the service mesh to do so. Each also contains an ingress on the internal Kong that we use for testing. This lets us generate the certs so that they're ready whenever we are, instead of accidentally turning on traffic to clusters that are waiting to generate certs. Learning from mistakes is awesome, guys. Too bad customers sometimes have to deal with it.

When all of this is done, we commit and push from the agent to the K8s gitops repository on the newly created branch and open a PR. One of the PR checks is the automated testing of the new cluster's services.

All of the above is 100% automated once the pipeline is kicked off. There's a lot more to it, where we add information into a database that we use as source-of-truth for DR, send slack alerts, update external monitors, etc., but that's the medium-detailed gist of it. The entire process takes a little over an hour, with the vast majority of it being just waiting for the EKS cluster to come up.

[-]

Complete_Guitar6746@reddit

Interesting!

What about databases? Do you have any large datasets that are migrated as part of this, or are they outside the clusters?

[-]

WhoLetThatSinkIn@reddit

We've stood up some test databases and mucked around with some complex pipelines that involve a lot of ebs snapshotting and copying, but we still use RDS and managed elasticsearch so migrations just happen during app pipelines.

[-]

nekokattt@reddit

Why would it? This is literally what the blue green deployment model is designed to address.

[-]

youngviking@reddit

If your k8s deployments aren't easily replicable, then I fear you have done something wrong. Also, cluster management tools exist (e.g. rancher), but most people are probably just using something like EKS anyway.

[-]

SittingWave@reddit

When one server goes down, it’s taken out back, shot, and replaced on the line.

yes but do we get hamburgers out of the dead server?

[-]

Malforus@reddit

This is why when my team was forced to deploy a shiny new app on eks the POC included the terraform to recreate the instance.

You can swap between clusters by just turning on the second cluster syncing the database and swapping the load balancer.

Or if you can eat 3 min of downtime taint the cluster and recreate.

[-]

theophilius@reddit

How do you handle swapping traffic between new/old clusters in an automated, repeatable way?

[-]

Main-Drag-4975@reddit

DNS and load balancing tooling appropriately sized and tuned for your use case, same as it’s been for decades.

[-]

theophilius@reddit

I assume also something like Argo? Otherwise redeploying applications to the new cluster would be a pain

[-]

syklemil@reddit

Yeah, you need some variant of gitops. Doesn't necessarily have to be argo. But you do want a pretty complete picture in revision control; anything not in it is basically cobwebs left with the old house. If those cobwebs contain stuff like image tags you're gonna have a bad time.

[-]

raze4daze@reddit

Corporations are more than welcome to fund an entity that maintains LTS for Kubernetes.

Most corporations “just” want a stable, secure, and MIT (or similar) license for any software they depend on. How convenient.

[-]

Reverent@reddit

Realistically, businesses need to factor upgrade cycles into their total cost of ownership.

You know what's expensive? Ransomware. What else? Ransomware insurance. You know how you keep insurance costs down? Compliance with security frameworks. Most of which have vulnerability management front and center.

What this boils down to is injecting maintenance cycles into Business as Usual process. Which means keeping updates small before they become big and especially before they become migratory. Migrations are for birds, not for regularly cycled maintenance.

[-]

lucid00000@reddit

Java devs recoil in fear at this statement

[-]

tanner_0333@reddit

The push for LTS shows the fight between needing stability and keeping up with Kubernetes fast changes It's a hard balance for many IT teams

[-]

dimitriettr@reddit

People who ask for longer LTS are kindly asking for trouble.

Keep your dependencies up to date, that's how your code does not become legacy day one, and you keep the technical debt at a low peace.

[-]

WhoLetThatSinkIn@reddit

Kubernetes itself has literally nothing to do with application code aside from what might be tying into the K8s APIs themselves.

That said, treating your clusters as immutable as your containers is the best way to do it. Want to upgrade? Do it in a new cluster.

We stopped upgrading ours about two years ago and just deploy everything but the production apps immutably; it makes life so much easier.

[-]

Main-Drag-4975@reddit

This is it. The more frequently you spin up a new cluster the less any of this matters. If it hurts, do it more often.

[-]

Uristqwerty@reddit

If it hurts, do it more often.

If it hurts you, and you're empowered to fix the pain points, do it more often. If it hurts your users, or you're blocked for implementing the necessary changes, don't.

[-]

syklemil@reddit

Yeah, it's basically the same story as with any other pet deployment vs continuous delivery.

The main downside is that running two clusters until you're convinced shutting down the old one is safe is pretty resource-intensive. I've experienced both bad upgrades and bad blue-green deployments, but I think I prefer the bad blue-green deployment over sitting with one dysfunctional cluster that doesn't really want to move forwards or back.

[-]

Damacustas@reddit

Does that also mean you don’t use persistent storage?

[-]

WhoLetThatSinkIn@reddit

We use it for redis, superset, etc., but nothing that has to be truly durable.

A requirement for devs to get into our clusters is statelessness in the sense that any state that requires durability must live outside of the cluster. It costs a little more dollar-wise, but saves a shit load of work we don't really have time for right now.

Also I still have a deeply ingrained fear of databases and state in K8s, even if it is irrational at this point. I've worked with K8s in production since 1.09 in 2017 and was the sole resource for my current company when I started building this process out in 2019.

We went live in production on EKS with mixed OS clusters the week before windows support was GA, it was pretty crazy stuff.

All that to say: outside of dev dbs that are loaded during instantiation of an application and currently very early POC, we use RDS.

[-]

Uristqwerty@reddit

Updating dependencies without understanding the full implications of the changes will also end in legacy code. Imagine a library release has the minor feature "Now understands paths containing /../", and as a result it opens up a security hole that lets users access endpoints they don't have the authorization for. Therefore, you need to mentally context-switch through every part of the project that interacts with the dependency, confirming that all assumptions are upheld, at least if you want to be certain nothing will break. My gut feel is that you end up with a quadratic equation, balancing the context size of the dependency's update versus the context size of the code it interacts with, where solving for optimal efficiency would end in skipping past small runs of feature releases, unless the changelog for a given update happens to contain a specific improvement you've been waiting for. Read every changelog, pay attention to future development milestones, but don't rush to update if there's no immediate benefit.

[-]

gingimli@reddit

What kind of trouble has an LTS caused with other technology?

[-]

baaaap_nz@reddit

We've dropped K8s for exactly this reason. We have a relatively small team, and keeping K8s up to date was almost a full time job for someone.

We moved our workload to ECS (Fargate) and we now have more time to spend on our product rather than the system that manages our product.

[-]

Reverent@reddit

Realistically this is the curve all product development should follow. Do what's most convenient until it stops being convenient. The only exception to this is making sure the talent understands the difference between "convenient" and "actually a paper thin wall between functional and organisation ending compromise".

[-]

thockin@reddit

This is, IMO, an intended effect of some implementations of managed Kubernetes. Some such products are relatively low overhead, and upgrades are a regular non-event. Others leave you to DIY, with the hope that it will hurt enough that you will abandon the industry-standard solution and instead use their proprietary, locked-in product.

I am sad to see that it works.

Not all managed Kubernetes products are created equal, even if analysts lump them together.

[-]

baaaap_nz@reddit

But it wasn't the managed K8s platform that was the problem... it was things like the nginx-ingress still being 0.x and requiring major rework every month.

It gave me PTSD from back when AngularJS first came out, when every release was a major rewrite/reimplementation because the project had a change of direction in how to do things

[-]

thockin@reddit

You don't have to use DIY nginx, some providers have nice built-in support for LBs.

And if you are judging k8s itself based on ecosystem components which were also pre-1.0, I don't think that's exactly a fair basis :)

[-]

baaaap_nz@reddit

I'm just saying that k8s "as a whole ecosystem", wasn't right for us, and caused us a lot of problems/frustrations/stress.

ingress-nginx was just an example of one of many components that were forever changing and requiring major changes.

I'm sure there are plenty of valid use cases that it's fantastic for, but I imagine one of the pre-reqs for all of them is to have a sizeable team of engineers dedicated to ensuring K8s runs smoothly, which we didn't have the luxury of.

[-]

thockin@reddit

I don't mean to imply it's needed in every situation, and I don't really mean to convince you it's for you.

That said, it doesn't have to be as hard as you think it is. :)

[-]

pickledplumber@reddit

I agree. I despise upgrading k8s.

[-]

tikkabhuna@reddit

How does an LTS help? Aren’t you just kicking the can down the road? Whenever I’ve had to do a long overdue, major major upgrade, it’s been far more painful than if I just regularly upgraded.

[-]

pickledplumber@reddit

Some small teams, especially if it's just 1 or 2 people don't have the time to do upgrades every few months. Especially if you're barely using the majority of the features that are being changed. It just becomes toil within itself to do the upgrades.

All asking for an LTS is saying is hey give me something a bit more stable so that I can not need to do the repetitive stuff so frequently.

We last upgraded K8s in the spring. My Q3 project is writing a database proxy for a DB migration. I have to work with other teams. We know in Q4 we need to upgrade k8s again. But the main app dev who was working with me got sick and the proxy project got delayed. Now we have to upgrade k8s and I have to do my project. But because there's so little flexibility I'm up against a wall. I essentially have to stop my proxy project and go into bringing up new clusters and testing all the charts. Then with my lost focus I go back to the proxy project and try to finish it.

Sure this is the k8s projects fault. But an LTS would give some flexibility in situations like these where the representative tasks of upgrade can be waited for for longer.

[-]

sylvester_0@reddit

All asking for an LTS is saying is hey give me something a bit more stable so that I can not need to do the repetitive stuff so frequently.

LTS is not easy or cheap to do. Sure, it's nice for consumers but think of the amount of work for maintainers.

[-]

pickledplumber@reddit

Of course it's a lot of work. But I don't think people are asking for so much.

Right now versions of kubernetes have features you can enable that aren't default. One of the main main problems with upgrading kubernetes is the removal of apis.

Example: Consider you have some service that you're running that works fine. What's causing you? No issues and has been fine for years. You had your engineers create this helm chart and it worked for your product splendidly. Kubernetes isn't forcing you to upgrade, but your cloud provider is unless you pay them a lot of money. You know your Senior Management isn't going to pay those fees for that extended support. So you're forced to upgrade kubernetes. But as you investigate, you realize that they are radically removing in API endpoint that you rely on and the V1 endpoint that's replacing it has a completely different format and the prior one. Now that most things have reached V1 this isn't the biggest problem anymore but it has been in the past.

One thing the developers of kubernetes could do is allow those older api's to be enabled and disabled for a few versions. This way you give people some flexibility so that they're not completely forced to upgrade their apps just because you want to remove an API. The mechanism is already there.

Of course, situations like this might not happen for everybody. But I've dealt with it quite a few times.

Of course I understand the mindset of the developers who are making this thing. But when you want enterprises to take advantage of the software, their main focus is their product and making money. So there needs to be forgiveness in the process so that you can leave something for some time and then finish something else like a database migration or redoing your search clusters. And then you can cycle back to kubernetes. Most people who work on kubernetes aren't only doing kubernetes. At least I usually don't see people that focused

Maybe the cloud providers should be the ones focused on providing those long-term releases of kubernetes instead of the kubernetes developers themselves

[-]

thockin@reddit

There's some confusion here. The only APIs that go away are alpha and beta, and at least beta has a mandatory grace period of at least 2 releases.

If you are building production systems on beta APIs, you sort of signed up for a little discomfort. That said, I acknowledge that IN THE PAST we have made it far too easy to use beta APIs, and have left important functionality beta for too long. We have tried to stop doing that.

[-]

pickledplumber@reddit

I understand what you're saying but I really don't think it's realistic. If people weren't supposed to be using beta apis , then the majority of kubernetes users would have had to start using it within the last year as they waited for nearly a decade for it to reach V1 all around.

You say The beta apis had two cycles of Grace. But considering some people have been on beta apis for 5+ years. You don't think two short release cycles is a bit curt? That's just a few months. That doesn't even allow a large team to refocus quickly enough. Instead it preempts them and forces their hand. And that's what I'm talking about.

Old people are saying is, Hey, Can we make this thing a little more user-friendly? Doing things like removing apis that you could just disable. Leaving then for a time when it causes the least trouble for users would be a positive move. Keeping old documentation up so that users don't have to rely on shady archived posts on blogs just so they can get documentation for older kubernetes versions. This like that kinda sucks.

[-]

thockin@reddit

Leaving those APIs has a real cost (on the project) and there really is no "time when it causes the least trouble". If there's no reason for users to change, they will not change. In the meantime, those extra APIs have to be carried in the code, with all the conversions, tests, etc. Every new API change has to be ported to the old APIs (because we guarantee round-trip).

Beta APIs are for testing things. Please don't use them if you are not a) happy to file bugs; and b) willing to adapt when they go GA.

As for your first point, I just don't know what you mean. The vast majority of APIs have been GA for years. The on biggggg flub was Ingress, but few others were nearly as bad.

[-]

spicypixel@reddit

Yeah with bigger jumps, I'd rather bounce down the release tree 3 times a year instead of jumping LTS (6 jumps if it's two years) in one go, a lot more things would go from deprecated to just gone in 6 releases.

[-]

Possibly-Functional@reddit

I disagree. This may be purist thinking but that to me sounds like building around the problem rather than actually solving it. The real problem being that your infra is setup in a way where updates are difficult. Pets where it should be cattle.

[-]

robvdl@reddit

And Docker doesn't? it updates several times a week at times. Every time it's like "a Docker update, lets restarts all container again". And this is several times a week such an update is pushed down.

[-]

zam0th@reddit

Yáll need to stop updating key production infrastructure from internet.

[-]

Jmc_da_boss@reddit

Kubernetes doesn't need an LTS, the enterprises that have staked their infrastructure on an open source project do.

That's their fault. We are in the same boat, the upgrades are fast and take up time. But that's what you sign up for.

[-]

pwab@reddit

“That’s their fault” [for using this software]. Interesting take.

[-]

Jmc_da_boss@reddit

Yes? When you as an enterprise evaluate and decide to use an open source tool that you are not paying for you literally are accepting all responsibility for managing it. That's kind of the entire point... it's why you do due diligence on such things like "upgrade cycles"

[-]