Containerization never made any sense to me, I do not see any vast difference with virtualization. [Long Post Ahead]
Posted by tastuwa@reddit | linuxadmin | View on Reddit | 20 comments
I’ve been working with Docker, k3s (command line), and Rancher (GUI) for a while now, but there’s one thing that’s haunted me forever: I never really understood what I was doing or why it made sense.
To me, virtualization and containerization have always felt the same. For example: With virtualization, I can clone a VM to build a new VM(in virtualbox or hyper-v for example. I have not yet used big daddies like vmware). With Kubernetes, I can create replicas of pods or deployments.
But when people say things like “there’s an OS in a virtual machine but no host OS in Kubernetes,” it just doesn’t click. How can Kubernetes run without an OS? Every pod or deployment needs an OS underneath, right that alpine linux or something i forgot? In fact, I see a bigger problem with Kubernetes: instead of having a single OS like in a VM, now we have many OS instances (one per container or pod). You could argue that OS size is small in containers. But it is not really something alone that buys me containerization instead of virtualization.
I recently interviewed with a DevOps team (I have 2 years of experience as a Linux IT support engineer), and questions like “What’s the difference between virtualization and containerization?”
What is traefik? They asked me. I said api gateway as I had read that in Apress book intro page. I blabbered it was something for SSL termination, reverse proxy, api gateway etc.
I am unable to have clarity on things I am working even though I can work as a linux support person(I hate calling myself an engineer lol). I want to improve and understand these concepts deeply. I’ve started investing entire time(I quitted my job) in learning computer science foundations like networking and operating systems, but I’m unsure if I’m studying the right materials to finally grasp DevOps concepts or if I’m just reading irrelevant stuff.
TLDR: What are the founding principles of microservices and containerization, especially regarding docker and kubernetes?
People say learn linux first, but I consider myself pretty intermediate with linux. Maybe I am measuring against the wrong tape. Please enlighten me folks.
red_flock@reddit
First of all, containerisation is about squeezing the last bits of memory, storage and CPU. Not everyone needs to worry about this, many can afford to overprovision plenty of hardware/storage/memory up front and let the project grow and consume the slack resources until new hardware budget is allocated. This is the old school way of thinking and it works. Virtualisation is sufficient.
But this is not true of an entity like Google. They need to squeeze every little bit out because at their scale, even if you save a little, you are saving by the millions. So they are happy to deal with the complexity of containerisation.
Not everyone is Google, and many use containers like Virtual machines, having a full fat filesystem and overprovision memory, and the application cannot be horizontally scaled. This is a poor fit and obviously virtualisation is probably the easier way to go from the Linux admin point of view.
But Kubernetes is more than just docker/containers. It is now a full blown ecosystem that integrates nicely from github to CI/CD. Trying to use VMs... well, I dont know how to. It's just easier to go with the flow, and developers expect Kubernetes now.
And then there are newish stuff like Confidential Containers, which are containers within VMs... that's a different use case altogether and again, you probably need it because you need VM's security but developers expect containers.
Klukogan@reddit
To me it's mostly 2 things, cost and management. Sure, a container needs a host. But cloud providers like AWS developed service like ECS Fargate where you only manage the tasks(one or more containers) and the host is managed by AWS. And you only pay for the containers usage, not the host. You can turn costly servers into cheap ECS services. Not everything, of course. It's also easier to manage. You don't have to care about updates, you just rebuild your containers when you need to.
sur6e@reddit
Maybe it helps to say it like this. The containers share one set of OS files outside them. The VM's all have a full OS installed in each one.
haloweenek@reddit
I was against that because of overhead but now as I work with multiple services, containerization is 🥰
With a decent makefile + docker compose you start up new services in a second and it’s 100% replicable anywhere.
Afterwards you throw this into ci/cd and it takes care of everything.
I was running ansible before - not doing this again.
CaptainZippi@reddit
https://www.reddit.com/r/ProgrammerHumor/comments/cw58z7/it_works_on_my_machine/?rdt=44692
MindStalker@reddit
When running containers, go into the main host and look at your processes. All the container processes will be visible. They are running in your parent host.
A container is a process with binders on. It can only see the files and network that are given to it.
One huge advantage is that if you are sharing the same base layer or upper layers (like Ubuntu) those files will only exist once, and be presented to all containers equally. And writes will happen in a unique layer on top of this shared base.
I agree that K8s does add a lot of overhead, but it starts to make sense when you are dealing at the cloud scale.
Max-P@reddit
With virtualization, you run an entire operating system including its system.
With a container, you're typically running a single application, in the environment of whatever base distro. Yes there's Alpine or Debian or Ubuntu or whatever in there, but it's there only for its libraries to run the application. It's not really running the whole of Alpine, just your application. None of the other services run, even if they're technically installed. There's ways to do it (systemd-nspawn, LXC), but for Docker/k8s you only run one app.
So you can have say, something that only works in Ubuntu 18.04 or whatever, and run it as a container, and then run something that only works on RHEL 7 in another container, and run them both in the same pod and talk over 127.0.0.1 if they want to like they're on the same machine, because they are, but they also won't see eachother's files.
It's still all running under your host OS's kernel and rely on other services from the host to do normal housekeeping. It's all about setting up the right execution environment for the benefit of the app you're trying to run.
Unlucky-Shop3386@reddit
With container it's really just a rootfs without the kernel . It can be whatever you like . A single musl binary sure. Now this all happens in a namespace a function built into Linux kernel . Think of a namespace as a layer. On top the host Linux kernel . Your container.
diito_ditto@reddit
If you want to advance in your career at all you really need to understand containers as they are the standard these days and have been for awhile.
Some simplified basics:
If you have a personal server or homelab I'd suggest containerizing all the services you run on them. You will use less system resources. OS updates will never break your services. It will make moving your services to a new server trival as you just need a copy of whatever directory of persistent data the container might need (if any) and a definition file for the container.
yottabit42@reddit
There are more detailed answers here. But simply, a VM emulates the full hardware stack like it's a real computer but virtualized. This means your VM runs its own OS kernel and has drivers to interact with the (virtualized) hardware. These days we try to use "paravirtualized" drivers for heavy I/O like storage and network. These are specialized drivers written specifically for virtualized hardware that avoids expensive emulation and CPU overhead.
Containers are a stripped down user land that contains (or at least, should contain) only the minimum software/packages required to run the specific service. This limits maintenance and security exposure. The container does not run its own OS kernel, but instead uses the host OS kernel, which saves a whole layer of hardware emulation and expensive I/O emulation.
Containers will typically be much smaller, lighter, and somewhat faster than a VM.
Technically speaking, there is theoretically a higher security risk since the kernel is shared, but these days I would say that risk is quite minimal, though non-zero. There is a non-zero risk with VMs too, but with an extra layer of abstraction it's theoretically less security risk than a container.
DaylightAdmin@reddit
You mix up many things, English is my second language but I will try my best:
With a VM you create a Virtual Machine, that is a whole PC that is virtualized, all from the CPU, storage, GPU, network and other UI. So you can run diffent kernels and OS systems, you have absolute separation, everything is its own and nothing is aware of anything else running on the host. That is what you get with Hypervisor, Proxmox, VMWare, VirtualBox, QUEMU.
Now with Linux you have 2 other ways to separate your softwar. Linux Containers, LXC for short, here you split everything but use the same kernel, you can use your own networking but can share your storage, or not. It is really flexible. The idea is that you can save on memory if you run everything on the same kernel, but split up the software without interfering to eachother. You can separate so well that you can give someone root in the LXC container and he should not be able to escape from it. It is great for Providers who want to split the hardware to as many customers as possible. A LXC container is created nearly instantly.
The other great way to split software are docker like containers, open container interface if I remember right. The idea here is to solve the age old problem, it runs on my machine. A docker image bundles everything that the software needs to run, mainly the libraries. That is the reason you can start Debian, Ubuntu, alpine or Fedora on the same kernel. It doesn't split the networking that hard, that is the reason that in a kubernetes pod you can't open the same portal in each container. With the layered storage underneath you save on storage space for the images. But you can't start an docker container and expect that you can't give someone root in that thing and he can't influence other containers, that is not a focus of docker. Here you have to choose between docker, podman, containerd (kubernetes).
That all is because of historical reasons, first we believed we have to virtualize a server to sell the same machine to multiple people, than we realised if everyone spins up the same kernel, we can share the kernel. So better interfaces (GUI, Webinterface, CLI) where created and we saw that the most people just spun up the same containers, so we took the direct access away and gave them a nice interface where they can start pre-approved containers and they will be happy.
I hope I didn't make any major mistakes.
NightOfTheLivingHam@reddit
the goal is to have sandboxed versions of software that can be quickly deployed with templates
As if you're installing a piece of software with an MSI file with a definition file attached, but it's sandboxed and isolated from the host system.
It allows for rapid, quick, and cheap deployment of apps and services vs installing a fully virtualized OS just to run one thing.
Example, if you just need to run a unifi controller, instead of running an entire linux OS and deploying a whole OS template to run one piece of software, you run a container that does the same thing on one host that uses only the resources needed to run that piece of software, but contained in its own pseudo environment that can be isolated from other apps and services.
Now suddenly with one kubernetes template, you can deploy multiple controllers rapidly, with their own IPs and configs, using only what is needed to run those applications. You can automate this. Uses less resources and can be done cheaply. You can run more containers on one host than you can run VMs without over-committing.
Taledo@reddit
I'll add something from my experience at work. We have around 200 ish VMs for different software, and it's a pain in the ass to keep everything up to date (we have some with auto security updates, but try doing that with software written 10 years ago and hard dependencies..)
Point is, if you run docker, k8s or whatever, you can update your VMs more easily without breaking prod, and only care about updating the containers when needed.
NightOfTheLivingHam@reddit
Exactly. It creates more flexibility. The more modular and flexible you can make a system, the better.
I feel you on the vm management.
cneakysunt@reddit
It's easier to manage containers at scale using pipelines and orchestration to manage deployment and lifecycle to VMs.
I can't speak to cost of vs. Someone may know (or be arsed to figure it out).
akindofuser@reddit
I have a different take than others here and a bit more simplistic.
Containers are quite wonderful. In fact they are just exactly like your beloved VM’s but smaller and more efficient and more packageable.
Where we deviated into hell is our diehard nose dive into orchestration productization and unnecessary network virtualization that begot more even products and the whole stack quickly turned into a tech conference vendor exhibition floor. What was once git, salt or ansible, docker or kvm turned into argo, gitlab, rounded, terraform, jinjaform, managed k8s, managed security product, etc etc etc etc etc. Pretending like we’re smart because we bought a bunch of trash to try and do our jobs for us, which all works poorly together.
Some of the best solutions I’ve seen were just basic containers using simple orchestration controllers. Avi network’s original LB was a fantastic example. No k8s, no docker swarm. Just containers and a simple controller.
Independent-Mail1493@reddit
The difference between containerization and virtualization is that with virtualization such as VMware, KVM or HyperV you're creating a virtual machine that is fully compliant with the Popek and Goldberg virtualization requirements. You're creating an entire hardware stack with your virtualization software from the CPU on up and can run whatever operating system you like as long as it is supported by the architecture of the virtual machine. Virtualization doesn't limit you to running the same architecture on the virtual machine as you are on the host. You can use the QEMU/KVM/libvirt stack to create a virtual machine emulating a SPARC CPU running Solaris on x86-64 hardware.
With containerization you're creating a limited machine that virtualizes an operating system and an application. Containers translate system calls within the container to system calls on the native operating system that the container is running in, virtual machines translate the hardware instructions inside of the virtual machine to hardware instructions that run on the native hardware.
Try this as an aid to understanding the difference between virtual machines and containers. Create a virtual machine on a Linux system using KVM and then run ps to list the processes running on the host system. You'll see one PID for each virtual machine that's running and that's it. As far as the host system is concerned the virtual machine is a black box and you have no visibility inside of it from the host system. Now create some containers on a machine running Docker or Rancher. If you run ps on the container host you'll see PIDs and remapped UIDs for each process running inside of the containers.
kobumaister@reddit
Looks like you don't understand the difference between containers and VMs. There is a huge difference between them.
Both_Lawfulness_9748@reddit
Nor did I until I had orchestration (Hasicorp Nomad)
I just throw a job file at it and let it worry about where the services run. Container tags sort out ingress routing and SSL using traefik. Service discovery lets containers find each other.
Auto scaling multiple instances for load balancing is just configuration and the orchestration system manages it for you.
Deployments directly from CI/CD chains for revision control.
The ROI is huge.
Ontological_Gap@reddit
You have to pay for less hardware if you only run one kernel. Yes there are security trade-offs