Containerization never made any sense to me, I do not see any vast difference with virtualization. [Long Post Ahead]

Posted by tastuwa@reddit | linuxadmin | View on Reddit | 25 comments

I’ve been working with Docker, k3s (command line), and Rancher (GUI) for a while now, but there’s one thing that’s haunted me forever: I never really understood what I was doing or why it made sense.

To me, virtualization and containerization have always felt the same. For example: With virtualization, I can clone a VM to build a new VM(in virtualbox or hyper-v for example. I have not yet used big daddies like vmware). With Kubernetes, I can create replicas of pods or deployments.

But when people say things like “there’s an OS in a virtual machine but no host OS in Kubernetes,” it just doesn’t click. How can Kubernetes run without an OS? Every pod or deployment needs an OS underneath, right that alpine linux or something i forgot? In fact, I see a bigger problem with Kubernetes: instead of having a single OS like in a VM, now we have many OS instances (one per container or pod). You could argue that OS size is small in containers. But it is not really something alone that buys me containerization instead of virtualization.

I recently interviewed with a DevOps team (I have 2 years of experience as a Linux IT support engineer), and questions like “What’s the difference between virtualization and containerization?”

What is traefik? They asked me. I said api gateway as I had read that in Apress book intro page. I blabbered it was something for SSL termination, reverse proxy, api gateway etc.

I am unable to have clarity on things I am working even though I can work as a linux support person(I hate calling myself an engineer lol). I want to improve and understand these concepts deeply. I’ve started investing entire time(I quitted my job) in learning computer science foundations like networking and operating systems, but I’m unsure if I’m studying the right materials to finally grasp DevOps concepts or if I’m just reading irrelevant stuff.

TLDR: What are the founding principles of microservices and containerization, especially regarding docker and kubernetes?

People say learn linux first, but I consider myself pretty intermediate with linux. Maybe I am measuring against the wrong tape. Please enlighten me folks.

[-]

manilove__frogs@reddit

Containers contain dependencies all packaged at once. They are ephemeral, can start up in seconds, scale to zero in between jobs and things of that nature.

Sometimes two different apps can have version dependencies of different packages of the same app, so by containerizing they each have their own copy of the dependencies they need.

When you have a VM cluster running containers, the containers and VMs can be put into drain and then do system updates, reboots, etc... without outages.

[-]

Academic-Gate-5535@reddit

No matter how beefy your system is, or well designed it is. Scaling a VM is substantially slower and more resource intensive than scaling a container.

And of course containers can be a whole "OS", or just a single application, depending on a use case.

You also abstract away some issues like disk provisioning, as it's writing to the same FS. Rather than disk images or devices.

You can run into issues on the disk side where you can't write any more to a VM, even though it appears to have plenty of free space! Same goes for RAM

[-]

NightOfTheLivingHam@reddit

the goal is to have sandboxed versions of software that can be quickly deployed with templates

As if you're installing a piece of software with an MSI file with a definition file attached, but it's sandboxed and isolated from the host system.

It allows for rapid, quick, and cheap deployment of apps and services vs installing a fully virtualized OS just to run one thing.

Example, if you just need to run a unifi controller, instead of running an entire linux OS and deploying a whole OS template to run one piece of software, you run a container that does the same thing on one host that uses only the resources needed to run that piece of software, but contained in its own pseudo environment that can be isolated from other apps and services.

Now suddenly with one kubernetes template, you can deploy multiple controllers rapidly, with their own IPs and configs, using only what is needed to run those applications. You can automate this. Uses less resources and can be done cheaply. You can run more containers on one host than you can run VMs without over-committing.

[-]

ImpossibleEdge4961@reddit

You can do similar "templating" things with Virtual Machines as well. For instance, you can pass variable information via fwcfg (or whatever) and have a playbook that brings the new VM in conformance.

But that is two things:

1) non-standard. Less of a concern because if this were an ideal pattern there would have been some standard mechanism for doing so. It's just that people will just tell you to use containers if you're trying to do something like this on your own. Unless you have a reason to use VM's at which point there's no getting around it.

2) Very hefty. You're running separate instances of things like the kernel or NetworkManager, etc, etc all just to ultimate run a single solitary nginx binary that once loaded might not really need the OS anymore outside of just receiving its input/output. Hence why some apps can just be ran distroless.

[-]

Taledo@reddit

I'll add something from my experience at work. We have around 200 ish VMs for different software, and it's a pain in the ass to keep everything up to date (we have some with auto security updates, but try doing that with software written 10 years ago and hard dependencies..)

Point is, if you run docker, k8s or whatever, you can update your VMs more easily without breaking prod, and only care about updating the containers when needed.

[-]

NightOfTheLivingHam@reddit

Exactly. It creates more flexibility. The more modular and flexible you can make a system, the better.

I feel you on the vm management.

[-]

ImpossibleEdge4961@reddit

To me, virtualization and containerization have always felt the same

Probably because they are? containers are just OS level virtualization rather than running a full OS with each application you run. That was often done just because that was how you isolate each application from each other and enforce security boundaries.

But most of this post is basically just a stubborn refusal to learn the system. Most of the things you're talking about are basically "Kubernetes 101" and would make sense if you had actually worked with it before. For instance, you would have already learned about image layering and wouldn't have your question about "multiple OS instances." You only have that question because you've become oriented with it at a surface level have apparently stopped there and are now just complaining about it. Because learning about image layers isn't terribly far in anyone's Kubernetes journey so you must have stopped learning pretty early on.

[-]

akornato@reddit

The real difference isn't about having or not having an OS - containers share the host kernel but package everything else needed to run an application, making them much lighter than VMs which need entire operating systems. Think of it this way: VMs are like having separate houses for each application, each with its own foundation, plumbing, and electrical system, whereas containers are like apartments in a building that share the core infrastructure but have their own isolated living spaces. The game-changer isn't just resource efficiency - it's about packaging applications with their exact dependencies so they run consistently anywhere, scaling individual components independently, and deploying updates without touching the entire system.

Your Linux skills are probably fine, but you're missing the conceptual framework that ties everything together. Microservices break monolithic applications into smaller, independent services that can be developed, deployed, and scaled separately - containerization just happens to be the perfect delivery mechanism for this approach. Traefik is a reverse proxy and load balancer that automatically discovers services and routes traffic to them, which is crucial when you have dozens of microservices that need to communicate. The foundation you're building with networking and OS concepts is exactly right, but you need to connect those dots to see how containers solve real deployment and scaling problems that VMs handle poorly. When you're ready to tackle those tricky interview questions about these concepts, I actually work on interview AI copilot - it's designed to help you navigate exactly these kinds of technical discussions and explain complex topics clearly during interviews.

[-]

red_flock@reddit

First of all, containerisation is about squeezing the last bits of memory, storage and CPU. Not everyone needs to worry about this, many can afford to overprovision plenty of hardware/storage/memory up front and let the project grow and consume the slack resources until new hardware budget is allocated. This is the old school way of thinking and it works. Virtualisation is sufficient.

But this is not true of an entity like Google. They need to squeeze every little bit out because at their scale, even if you save a little, you are saving by the millions. So they are happy to deal with the complexity of containerisation.

Not everyone is Google, and many use containers like Virtual machines, having a full fat filesystem and overprovision memory, and the application cannot be horizontally scaled. This is a poor fit and obviously virtualisation is probably the easier way to go from the Linux admin point of view.

But Kubernetes is more than just docker/containers. It is now a full blown ecosystem that integrates nicely from github to CI/CD. Trying to use VMs... well, I dont know how to. It's just easier to go with the flow, and developers expect Kubernetes now.

And then there are newish stuff like Confidential Containers, which are containers within VMs... that's a different use case altogether and again, you probably need it because you need VM's security but developers expect containers.

[-]

Klukogan@reddit

To me it's mostly 2 things, cost and management. Sure, a container needs a host. But cloud providers like AWS developed service like ECS Fargate where you only manage the tasks(one or more containers) and the host is managed by AWS. And you only pay for the containers usage, not the host. You can turn costly servers into cheap ECS services. Not everything, of course. It's also easier to manage. You don't have to care about updates, you just rebuild your containers when you need to.

[-]

sur6e@reddit

Maybe it helps to say it like this. The containers share one set of OS files outside them. The VM's all have a full OS installed in each one.

[-]

haloweenek@reddit

I was against that because of overhead but now as I work with multiple services, containerization is 🥰

With a decent makefile + docker compose you start up new services in a second and it’s 100% replicable anywhere.

Afterwards you throw this into ci/cd and it takes care of everything.

I was running ansible before - not doing this again.

[-]

CaptainZippi@reddit

https://www.reddit.com/r/ProgrammerHumor/comments/cw58z7/it_works_on_my_machine/?rdt=44692

[-]

MindStalker@reddit

When running containers, go into the main host and look at your processes. All the container processes will be visible. They are running in your parent host.

A container is a process with binders on. It can only see the files and network that are given to it.

One huge advantage is that if you are sharing the same base layer or upper layers (like Ubuntu) those files will only exist once, and be presented to all containers equally. And writes will happen in a unique layer on top of this shared base.

I agree that K8s does add a lot of overhead, but it starts to make sense when you are dealing at the cloud scale.

[-]

Max-P@reddit

With virtualization, you run an entire operating system including its system.

With a container, you're typically running a single application, in the environment of whatever base distro. Yes there's Alpine or Debian or Ubuntu or whatever in there, but it's there only for its libraries to run the application. It's not really running the whole of Alpine, just your application. None of the other services run, even if they're technically installed. There's ways to do it (systemd-nspawn, LXC), but for Docker/k8s you only run one app.

So you can have say, something that only works in Ubuntu 18.04 or whatever, and run it as a container, and then run something that only works on RHEL 7 in another container, and run them both in the same pod and talk over 127.0.0.1 if they want to like they're on the same machine, because they are, but they also won't see eachother's files.

It's still all running under your host OS's kernel and rely on other services from the host to do normal housekeeping. It's all about setting up the right execution environment for the benefit of the app you're trying to run.

[-]

Unlucky-Shop3386@reddit

With container it's really just a rootfs without the kernel . It can be whatever you like . A single musl binary sure. Now this all happens in a namespace a function built into Linux kernel . Think of a namespace as a layer. On top the host Linux kernel . Your container.

[-]

diito_ditto@reddit

If you want to advance in your career at all you really need to understand containers as they are the standard these days and have been for awhile.

Some simplified basics:

Containers don't have an OS because they use the host's kernel (a kernel is technically the OS). This means that unlike a virtual machines you can't run an application from another OS as a container.
Containers are more like a python virtual environment, a flatpak or snap, etc. You have a base OS (parts of the OS install only), the bare minimum to bootstrap your application typically. Alpine Linux is popular because it has a tiny footprint.
This means a container is much lighter weight than a VM and uses far less resources. They start much faster, etc.
Containers eliminate the need for typical OS management overhead. You don't need to patch them or run config management against them. You don't login to them to check logs etc. They are meant to be throwaway. If you need an updated version you update your dockerfile or whatever alternative you are using and build a new replacement container instead. It makes you application portable.
Building new containers is really fast and easy and lends itself to common DevOps and GitOps practices better. Typically a container lifestyle a dev might check a new dockerfile into git, a CICD pipeline would run and build the container, test it, and if past deploy the container to production.
Containers don't eliminate the need for VMs entirely. Not all applications can easily be containerized. You will always need to be able to run different OS, VM appliances, etc.

If you have a personal server or homelab I'd suggest containerizing all the services you run on them. You will use less system resources. OS updates will never break your services. It will make moving your services to a new server trival as you just need a copy of whatever directory of persistent data the container might need (if any) and a definition file for the container.

[-]

yottabit42@reddit

There are more detailed answers here. But simply, a VM emulates the full hardware stack like it's a real computer but virtualized. This means your VM runs its own OS kernel and has drivers to interact with the (virtualized) hardware. These days we try to use "paravirtualized" drivers for heavy I/O like storage and network. These are specialized drivers written specifically for virtualized hardware that avoids expensive emulation and CPU overhead.

Containers are a stripped down user land that contains (or at least, should contain) only the minimum software/packages required to run the specific service. This limits maintenance and security exposure. The container does not run its own OS kernel, but instead uses the host OS kernel, which saves a whole layer of hardware emulation and expensive I/O emulation.

Containers will typically be much smaller, lighter, and somewhat faster than a VM.

Technically speaking, there is theoretically a higher security risk since the kernel is shared, but these days I would say that risk is quite minimal, though non-zero. There is a non-zero risk with VMs too, but with an extra layer of abstraction it's theoretically less security risk than a container.

[-]

DaylightAdmin@reddit

You mix up many things, English is my second language but I will try my best:

With a VM you create a Virtual Machine, that is a whole PC that is virtualized, all from the CPU, storage, GPU, network and other UI. So you can run diffent kernels and OS systems, you have absolute separation, everything is its own and nothing is aware of anything else running on the host. That is what you get with Hypervisor, Proxmox, VMWare, VirtualBox, QUEMU.

Now with Linux you have 2 other ways to separate your softwar. Linux Containers, LXC for short, here you split everything but use the same kernel, you can use your own networking but can share your storage, or not. It is really flexible. The idea is that you can save on memory if you run everything on the same kernel, but split up the software without interfering to eachother. You can separate so well that you can give someone root in the LXC container and he should not be able to escape from it. It is great for Providers who want to split the hardware to as many customers as possible. A LXC container is created nearly instantly.

The other great way to split software are docker like containers, open container interface if I remember right. The idea here is to solve the age old problem, it runs on my machine. A docker image bundles everything that the software needs to run, mainly the libraries. That is the reason you can start Debian, Ubuntu, alpine or Fedora on the same kernel. It doesn't split the networking that hard, that is the reason that in a kubernetes pod you can't open the same portal in each container. With the layered storage underneath you save on storage space for the images. But you can't start an docker container and expect that you can't give someone root in that thing and he can't influence other containers, that is not a focus of docker. Here you have to choose between docker, podman, containerd (kubernetes).

That all is because of historical reasons, first we believed we have to virtualize a server to sell the same machine to multiple people, than we realised if everyone spins up the same kernel, we can share the kernel. So better interfaces (GUI, Webinterface, CLI) where created and we saw that the most people just spun up the same containers, so we took the direct access away and gave them a nice interface where they can start pre-approved containers and they will be happy.

I hope I didn't make any major mistakes.

[-]

cneakysunt@reddit

It's easier to manage containers at scale using pipelines and orchestration to manage deployment and lifecycle to VMs.

I can't speak to cost of vs. Someone may know (or be arsed to figure it out).

[-]

akindofuser@reddit

I have a different take than others here and a bit more simplistic.

Containers are quite wonderful. In fact they are just exactly like your beloved VM’s but smaller and more efficient and more packageable.

Where we deviated into hell is our diehard nose dive into orchestration productization and unnecessary network virtualization that begot more even products and the whole stack quickly turned into a tech conference vendor exhibition floor. What was once git, salt or ansible, docker or kvm turned into argo, gitlab, rounded, terraform, jinjaform, managed k8s, managed security product, etc etc etc etc etc. Pretending like we’re smart because we bought a bunch of trash to try and do our jobs for us, which all works poorly together.

Some of the best solutions I’ve seen were just basic containers using simple orchestration controllers. Avi network’s original LB was a fantastic example. No k8s, no docker swarm. Just containers and a simple controller.

[-]

Independent-Mail1493@reddit

The difference between containerization and virtualization is that with virtualization such as VMware, KVM or HyperV you're creating a virtual machine that is fully compliant with the Popek and Goldberg virtualization requirements. You're creating an entire hardware stack with your virtualization software from the CPU on up and can run whatever operating system you like as long as it is supported by the architecture of the virtual machine. Virtualization doesn't limit you to running the same architecture on the virtual machine as you are on the host. You can use the QEMU/KVM/libvirt stack to create a virtual machine emulating a SPARC CPU running Solaris on x86-64 hardware.

With containerization you're creating a limited machine that virtualizes an operating system and an application. Containers translate system calls within the container to system calls on the native operating system that the container is running in, virtual machines translate the hardware instructions inside of the virtual machine to hardware instructions that run on the native hardware.

Try this as an aid to understanding the difference between virtual machines and containers. Create a virtual machine on a Linux system using KVM and then run ps to list the processes running on the host system. You'll see one PID for each virtual machine that's running and that's it. As far as the host system is concerned the virtual machine is a black box and you have no visibility inside of it from the host system. Now create some containers on a machine running Docker or Rancher. If you run ps on the container host you'll see PIDs and remapped UIDs for each process running inside of the containers.

[-]

kobumaister@reddit

Looks like you don't understand the difference between containers and VMs. There is a huge difference between them.

[-]

Both_Lawfulness_9748@reddit

Nor did I until I had orchestration (Hasicorp Nomad)

I just throw a job file at it and let it worry about where the services run. Container tags sort out ingress routing and SSL using traefik. Service discovery lets containers find each other.

Auto scaling multiple instances for load balancing is just configuration and the orchestration system manages it for you.

Deployments directly from CI/CD chains for revision control.

The ROI is huge.

[-]

Ontological_Gap@reddit

You have to pay for less hardware if you only run one kernel. Yes there are security trade-offs