My VM is lighter (and safer) than your container (2017)

epr · on May 14, 2024

No mention of user namespaces whatsoever, which is the primary security isolation mechanism for containers on linux. This is what enables "rootless" mode. Of course, this is from 2017, but user namespaces were released with linux 3.8 in February 2013.

Docker particularly has always required extra work to run in rootless mode because it was released soon after in March 2013, and for whatever reason it hasn't been a priority to rework the codebase to make that the default. I switched to podman for exactly this reason as my go-to oci implementation and haven't looked back.

The linux kernel features that enable various forms of isolation all require root privileges (CAP_SYS_ADMIN). Once user namespaces were a thing, that allowed you to use user namespaces to get around the root requirements for all the other isolation namespaces.

All of the below still require CAP_SYS_ADMIN:

CLONE_NEWCGROUP: cgroup namespace, for resource control (mem/cpu/block io/devices/network bandwith)

CLONE_NEWIPC: ipc namespace for sysv ipc objects and message queues

CLONE_NEWNET: network namespace, for isolated virtual networking

CLONE_NEWNS: mount namespace, for isolated mounting (filesystems, etc.)

CLONE_NEWPID: pid namespace, for isolated view of running processes

CLONE_NEWUTS: unix timesharing system namespace, for isolation of hostname and domain name

see: https://man7.org/linux/man-pages/man2/clone.2.html

ledgerdev · on May 14, 2024

Thanks for the suggestion, I was unaware of podman and will be trying it out because root has always bothered me.

angra_mainyu · on May 14, 2024

Podman is fantastic! It can also handle kubernetes and generate systemd units for containers.

ledgerdev · on May 23, 2024

> generate systemd units for containers

Oh man that is something I've been wanting as well.

sureglymop · on May 14, 2024

Nowadays it is fairly straightforward to set up docker in rootless mode.

epr · on May 14, 2024

Exactly, "set up". Many people (not all) don't want to fiddle with things, they just want it to work out of the box. The importance of secure defaults can't be overstated, especially when there are virtually no downsides.

DEADMINCE · on May 20, 2024

It's more work to switch to podman than it is to just configure docker to run rootless.

zerof1l · on May 14, 2024

Docker has a big community, lots of guides, and ready-to-use containers. It became pretty much a de facto standard for self-hosting things. You also have a very high chance of getting a piece of software to work out of the box as intended with Doker. The only way this or some other way of running stuff will overtake Doker is if it will match the Docker in these aspects.

As much as I'd love to try this lightweight VM idea, I don't have the time or energy to convert 20+ projects I'm self-hosting into this and then keep everything updated. I'd rather invest this time into learning Docker more and making my existing setup more secure and robust.

fhuici · on May 14, 2024

Maybe try out kraft.cloud: we take Dockerfiles as input and automatically convert to lightweight VMs/unikernels when deploying (disclaimer: I'm one of the paper's authors and one of the people behind KraftCloud).

sureglymop · on May 14, 2024

I recently built a similar thing for learning purposes using firecracker + the firecracker go api. I wrote a small init system in rust and combined that with filesystem images derived from the Debian, Ubuntu etc. container images (that can be extended with more layers).

What really surprised me the most, is how quick and simple it is to compile the linux kernel. Cloned a tag with --depth 1, configured it and then it took ~ 5 minutes to build vmlinuz.bin. As someone who is too young to have had to regularly do that, I had heard multiple stories of how long that's supposed to take but it really doesn't.

I then tried to move from firecracker to qemu microvms but didn't get that far yet since I didn't have more time.

All in all a great learning experience and if I wasn't an undergrad student with no time, I'd love to build a service/business around it.

DEADMINCE · on May 20, 2024

> I had heard multiple stories of how long that's supposed to take but it really doesn't.

It used to, but that's why they are only stories now.

jetbalsa · on May 14, 2024

it /used/ to take forever back in the Core 2 Duo days, with the amount of cores and the sheer speed of the IPC its gotten a ton better.

FridgeSeal · on May 14, 2024

Wow!

I came across Unikraft a while ago and went “wow that’s cool, but I have no idea how to use this”, cloud offering and docs you have up there now look amazing! Will 100% be giving this a go first thing tomorrow!!

__jonas · on May 14, 2024

I thought that’s what fly.io is, although I’m not familiar with it, am I missing something?

FridgeSeal · on May 14, 2024

Fly: takes your docker image, converts it into a Firecracker VM and runs that: kernel boundaries etc are all the same as before (and the same as running your container locally).

Kraft Cloud: takes your docker image, and turns it into a “unikernel”, and runs that. In a unikernel, your application _is_ the kernel. There’s no process boundary, no kernel-space/userspace split there’s a single address-space etc.

I believe the idea is that you get a perf benefit-as your application is often the only one running in the container, security is provided by the hupervisor anyways, so may as well cut out all the middle layers that aren’t getting you much. Seems some of the authors/founders of Unikraft are in the comments, they can explain much better than I.

fhuici · on May 14, 2024

Hey, author/founder here, thanks for providing that answer, all correct there :) . I would also add that KraftCloud unikernels are built using Unikraft, and that its modularity allow us to tailor/specialize those images to obtain great perf.

Finally, we also had to design and implement a controller from scratch -- nothing out there provided the millisecond semantics and scalability we needed (plus we also did tweaks to network interface creation and a few other things to get the end to end experience to be fast).

FridgeSeal · on May 14, 2024

That sounds amazing! Very keen to give it a try.

My work had a product that was doing builds and hosting for arbitrary client code, you’re doing all that, plus more. I’ve got massive respect for that, because there were some hard problems to solve, even in our pretty vanilla environment- looks like you guys have done a far better job than we did, plus more!

surajrmal · on May 14, 2024

It sounds like consequences of bugs like memory corruption are far more challenging to deal with in the Kraft cloud situation. Sometimes isolation has other benefits.

rakoo · on May 14, 2024

Isn't that better isolation though ? A memory corruption will at worse break the OS which is the app and nothing else. Push the model further and you can have one unikernel per user and reduce even further the consequences of bugs

kylecordes · on May 14, 2024

That is a very clever idea!

jve · on May 14, 2024

Ahem, this is a research paper. You should look at this stuff as "Innovation" and someone may just consider building a tool or product on the idea... or not.

fhuici · on May 14, 2024

Author here, we did this, first by continuing the research alongside the creation of the Unikraft LF OSS project -- the result of which was the Eurosys 2021 best paper award (https://dl.acm.org/doi/10.1145/3447786.3456248).

Commercially, we leverage Unikraft on kraft.cloud to provide a cloud platform with millisecond semantics.

jeltz · on May 14, 2024

The ready-to-use is only true for the most popular software like PostgreSQL. Anything else is often broken and/or unmaintained.

gbalduzzi · on May 14, 2024

In web development I had never experienced an issue with the commonly used softwares

k8sToGo · on May 14, 2024

Especially with something like linuxserver.io

DEADMINCE · on May 14, 2024

> making my existing setup more secure and robust.

The first step should be to not be so entirely reliant on Docker.

BobbyTables2 · on May 14, 2024

It’s amazing what a few hundred million in VC funding can do!

uticus · on May 14, 2024

OS: I provide isolation where needed, handle safely interacting with outside world, and abstract away all the pesky stuff so programmers can just get stuff done.

Container / VM: I provide isolation where needed, handle safely interacting with outside world, and abstract away all the pesky stuff so programmers can just get stuff done.

I get that a dev machine (OS) isn't usually suitable for deployment or shared development (Container/VM). But seems to me the promise of the Operating System has fallen short, if we are striving to meet so many of the same goals of the OS, with something on the OS that tries to abstract away the OS.

jayd16 · on May 14, 2024

Containers are an OS feature, though. The OS is fulfilling the promise of environment isolation with containers.

vilunov · on May 14, 2024

I guess this came to be due to the poor original security model of classic OSs, which led to prolification of viruses and complex management of shared resources. Users, groups and access flags are not enough to manage security of a system.

Linux tried to fix that with namespaces and it turned out to be more or less successful, but Linux is not an OS, it's just a kernel, and it's up to real OSs built atop Linux to use namespaces as an implementation detail for real application isolation.

One way to do that is OCI-containers, the other way is Flatpak. Neither of those is not a proper OS yet, but you could call Kubernetes an operating system which uses containers as means for application and resource isolation. Naturally that means Kubernetes is a complex beast, but that's what it takes to provide what users expect from an OS.

Android also comes to mind, they managed to isolate applications between each other quite safely.

clan · on May 14, 2024

I say this with great care as I do not want to launch a flamewar.

If you do not consider Linux with namespaces an OS (because of fragmented userland): Would you then consider FreeBSD with jails or Solaris with zones for fully fledged?

If you still consider those flawed (maybe because thet do not force you into jails/zones) should we at least no consider OS/390 or z/OS as proper operating systems to that/your (not meant inflamatory!) standard?

Yes. Though you do not mention them directly DOS and Windows has ruled the world for years and they opened the door for the nasties. But they were not all there was - only the popular/easy choice. Everything is a trade off.

vilunov · on May 14, 2024

Isolation mechanisms is not what makes an OS. It's the stable ABI that application developers can depend on and which provides a way to use shared resources: disk, CPU, RAM, GPU, network, screen space, push notifications, GUI integrations, your favorite LLM integration, so on, so forth... Yes, it might have an imperfect security model, but nothing's perfect under the sun.

Raw Linux without userspace could be considered an OS, but it has the ABI only in form of syscalls and the minimal standard FS. That's barely enough for anything other than, say, a statically linked Go binary, which is why it's seldom used by app developers as a target.

To most of your examples I say – yes, that's an OS, and jails or zones have nothing to do with it. Although I'm not familiar with them other than FreeBSD, so I'm relying on your short description and your implied criteria for selecting these examples.

jsheard · on May 14, 2024

From 2017, before rootless containers caught on I think. The conclusions on safety might be due for re-evaluating.

vegardx · on May 14, 2024

I don't really see how rootless containers change anything at all. You're still "just" one kernel privilege escalation away from breaking out. The level of isolation is much better in virtual machines, and the performance penalty is comparable these days.

The virtual machine images are a bit heavier, since you need a kernel and whatnot, but it's negligible at best. The memory footprint of virtual machines with memory deduplication and such means that you get very close to the footprint of containers. You have the cold start issue with microvms, but these days they generally start in less than a couple of hundred milliseconds, not that far off your typical container.

jfindley · on May 14, 2024

Memory de-dup is computationally expensive, and KSM hitrate is generally much worse than people tend to expect - not to mention that it comes with its own security issues. I agree that the security tradeoffs need to be taken seriously but the realworld performance/efficiency considerations are definitely not negligeable at scale.

There are also significant operational concerns. With containers you can just have your CI/CD system spit out a new signed image every N days and do fairly seamless A/B rollouts. With VMs that's a lot harder. You may be able to emulate some of this by building some sort of static microvm, but there's a LOT of complexity you'll need to handle (e.g. networking config, OS updates, debugging access) that is going to be some combination of flaky and hard to manage.

I by no means disagree with the security points but people are overstating the case for replacing containers with VMs in these replies.

fhuici · on May 14, 2024

And these overheads are even smaller if you use unikernels as per the paper. Eg, cold starts of a few milliseconds depending on the app/size of the image.

vegardx · on May 14, 2024

I'm struggling a little bit to grasp all the concepts when we start talking about unikernels, wasm and so on. Hopefully that's just a sign of the maturity of it, and not a sign of my mental decline. But on paper (as I understand it) it looks /so cool/.

epr · on May 14, 2024

Unikernels aren't too complicated conceptually. They're more or less a kernel stripped down to the bare minimum required by a single application. The complete bundle of the minimal kernel and application together is called a unikernel. The uni- prefix means one as in the kernel only supports one userspace application, instead of something like linux, which supports many. The benefits, as mentioned in the paper and in this thread are that you can run that as a vm, since it contains it's own operating system, unlike a container which is dependent on the host operating system. Also, they boot very quickly.

fhuici · on May 14, 2024

Agree with epr's definition of a unikernel (and no, no mental decline on your part, this isn't always well defined).

First off, a unikernel is a virtual machine, albeit a pretty specialized one. They're are often based on modular operating systems (e.g., Unikraft), in order to be able to easily pick the OS modules needed for each application, at compile time. You can think of it as a VM that has a say NGINX-specific distro, all the way down to the OS kernel modules.

VMs provide what's called hardware-level isolation, running on top of a hypervisor like KVM, Xen or Hyper-V. Wasm runs higher up the stack, in user-space, and provides what's called language-level isolation (i.e., you could even create a wasm unikernel, that is, a specialized VM that inside runs wasm (eg, see https://docs.kraft.cloud/guides/wazero/). Generally speaking, the higher you go up the stack, the more code you're running and the higher the chances of a vulnerability.

Aardwolf · on May 14, 2024

Why weren't containers rootless from the start anyway? What did they need that user space doesn't provide? Wine, emulators and VMs didn't require it either (with the exception of some VMs needing a kernel module for performance reasons like memory management, which I also find stupid, the OS should provide all the performance in user space).

epr · on May 14, 2024

As I mentioned in another comment, the linux kernel feature (user namespaces) that enables "rootless" containers was released in February 2013, and Docker was released soon after in March of that year. For whatever reason, they haven't made it a priority to make rootless the default, although it is technically doable. If you are annoyed by this, I'd suggest checking out podman, which has done a lot of work to be basically a drop in replacement with a similar workflow to docker.

imtringued · on May 14, 2024

Because the docker developers hate security. The idea of the docker group is insane, for example. You can mount any directory into a container so being in the docker group is like having a root account.

BobbyTables2 · on May 14, 2024

You mean for the container launcher to not require root?

Root isn’t required — look at Podman.

Docker spent so much on marketing, the world is too blind to pivot to a superior alternative!

zokier · on May 14, 2024

People were running containers for a decade before rootless podman came around.

There has been lot of sharp corners around userns and related tech that needed to get resolved. Notably Debian& Ubuntu disabled unprivileged userns for some legitimate security concerns

epr · on May 14, 2024

Funny, the original commit message for that suggests it was simply a precaution. It's not out of the ordinary to avoid newer kernel features just in case.

> This is a short-term patch. Unprivileged use of CLONE_NEWUSER is certainly an intended feature of user namespaces. However for at least saucy we want to make sure that, if any security issues are found, we have a fail-safe.

from: https://web.archive.org/web/20211022013829/https://kernel.ub...

Aardwolf · on May 14, 2024

I really don't get that: having to run something substantial as root seems a much bigger security concern, than what it is shielding from user space (example: hosting a web server at port 80)

TrueDuality · on May 14, 2024

There is a lot of discussion on here about the different isolation levels available, but these micro-VMs aren't playing in the same field and can't be compared apples-to-apples.

If you go read the paper this requires a specialized Xen kernel, which in turn requires processor virtualization extensions directly available where you're running these containers. Those extensions aren't generally available if you're already running inside of a VM.

This is a solution that only works on bare metal which I would bet money the vast majority of people using containers, outside of development environments at least, are not running their containers in bare metal but in an existing VM such as on AWS or GCP where this solution is simply a non-starter.

Neat, niche, and doesn't operate in the same world as containers.

_joel · on May 14, 2024

Should be tagged with [2017]

macspoofing · on May 14, 2024

>On the downside, containers offer weaker isolation than VMs, to the point where people run containers in virtual machines to achieve proper isolation.

That's not really why containers are deployed in VMs, especially in the context of on-prem enterprise software. I think that's more of a legacy issue. For example, for on-prem enterprise software, the enterprise already invested millions into their VM infrastructure so deploying a containerized stack means deploying into their VM infrastructure.

I think when centralized container orchestrators get enough market penetration with properly trained IT, you'll probably see that change.

Also, very few people choose containers for security and isolation. Typically it's for flexibility in deployment, and control of the environment (no more dependency hell).

nashashmi · on May 14, 2024

Stupid question but forgive me: Whats the difference between a container and a VM?

volkadav · on May 14, 2024

high level, a vm is an entire virtual machine with its own kernel/operating system/filesystem/etc. a container is a process (and associated files/archived filesystem) with a (more or less) isolated view of the world (network/filesystem/etc.) running on top of the same kernel/os as other processes on the same machine.

examples: a) vm - an entire windows install running in a window on my linux workstation so i can use tax software once a year. two kernels running at the same time. (N+1 for N VMs) b) container - a small python service, its dependencies, and various filesystem bits from alpine-minimal packaged into a file that docker/containerd/whatever can turn into the service running in a little isolated portion of my machine. no matter how many i run, one kernel. the various processes just don't see the host or other procs' files/memory/etc. via namespace trickery (unless there's a security problem, lol)

akdev1l · on May 14, 2024

A VM is a virtualized instance with virtual hardware and can therefore run its own operating system with its own kernel to interface with the virtual hardware.

A container is basically a process restricted by multiple kernel namespace isolation mechanisms. It shares the same kernel with the host and does not present any “virtual hardware”.

belter · on May 14, 2024

Technically, and simplifying enormously, the VM emulates the whole machine while the Container scopes the OS process. I prefer the analogy of an office building.

Your VM is your whole office building and overnight maybe a whole new company can move in but still using the whole building. Your Container is a set of rules, somebody told you when arrived to the reception desk. About where is the only office in the building you can use, plus maybe some common access to shared areas once in a while, like WC and Kitchen. :-)

chadcmulligan · on May 14, 2024

This is a nice little overview https://youtu.be/eyNBf1sqdBQ

hi-v-rocknroll · on May 14, 2024

Containers seem light and cheap, but they have subtle problems lacking solid guarantees, prioritization, or limits on compute, network, and storage resources that type-1 v12n provides.

wg0 · on May 14, 2024

Looking around Kraftcloud, it looks like fly.io but vocabulary is lot simpler and I can totally see it as a viable upcoming PaaS player.

Seems like you can run state full workloads too.

datadeft · on May 14, 2024

Is LightVM actively developed or used? The repo is empty.

codedokode · on May 14, 2024

"VM" means it has its own kernel? Why have 2 kernels on the same machine?

All processes in a proper OS are already isolated and there is no need for VM.

Cthulhu_ · on May 14, 2024

Isolated, but are they isolated enough? The article states that containers offer weaker isolation than VMs. (it doesn't quantify it though and I don't know this kind of thing offhand)

macspoofing · on May 14, 2024

Who is complaining? And if containers do not offer enough of an isolation, why would you think VMs do? There are use cases where you have to have host-level isolation - for example, if you want to build a HIPAA-compliant cloud service, your customer data has to be isolated at the host level and VMs are not enough.

codedokode · on May 14, 2024

Processes run in a userspace and cannot do anything without OS approval.

kevincox · on May 14, 2024

The Linux kernel has far too large of an attack surface to be trusted as a hard security boundary. It is good enough to prevent mostly trusted software from accidentally interfering with each other but I would not trust it to protect me from an untrusted workload.

For example GCP and AWS both have container running services. They both use hardware VMs to isolate different tenants. You will never share a kernel with another customer (I don't even think you will share one with yourself by default).

codedokode · on May 15, 2024

Maybe you need a better kernel then? For example, a microkernel.

fhuici · on May 14, 2024

I agree with the other comments. On the cloud, the VM is still the golden standard for strong (hardware-level isolation): if you deploy a container in the cloud, you can almost be sure there's a VM underneath. Given this, what we tried to do in that paper, in the LF Unikraft project (www.unikraft), and on kraft.cloud, is ensure that each VM only has the thinnest possible layer between the application and the hypervisor underneath -- strong isolation and hopefully max efficiency. We do use Dockerfiles to have users specify the app/filesystem, but then we transparently convert them to unikernels (specialized VMs) at deploy time.

akdev1l · on May 14, 2024

The kernel can be attacked and exploited.

Container escape exploits are more common than VM escape exploits.

brap · on May 14, 2024

Everything you said is correct, in theory. In practice, however...

nderjung · on May 14, 2024

Correct -- and you can run multiple kernels on the machine with virtualization extensions. Even Docker Desktop does this. You'd do this for _real_ isolation purposes.

hi-v-rocknroll · on May 14, 2024

It depends on the type of v12n. Paravirtualization and similar, the answer is sort-of while hard emulation is definitely yes. There are efficiencies in memory usage because the often will share the same kernel code and userland code, which are memory pages that can be deduplicated at the hypervisor level. Read more about type-1 v12n.

> All processes in a proper OS are already isolated and there is no need for VM.

No. This is not how things work in reality. (Ideally, yes because hypervisors are OS "duct tape" but there is no such readily-available OS with strict resource limits and hard enforced VFS and network isolation.) Isolation, sharing, and hard limits on RAM, CPU, networking, and storage (bandwidth, block devices, and IOPS) is beyond the capabilities of every major OS. This is why VMware and similar type-1 hypervisors exist.

flemhans · on May 14, 2024

Did something come out of it? What's the best way to run thousands of superlight VMs in 2024? WebAssembly?

jonahbenton · on May 14, 2024

(2017)

koprulusector · on May 14, 2024

For those that didn’t take the time to read, this is about unikernels

1vuio0pswjnm7 · on May 15, 2024

Firecracker micro VM uses musl not glibc. Is it true.

mrAssHat · on May 14, 2024

Integrate that with kubernetes and I'm sold.

crabbone · on May 14, 2024

There are NVidia's Kata containers: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator... . I'm not sure you need the physical GPUs to run them though. Most likely not.

I'm wondering though what value will Kubernetes add beside integrating with existing (presumably Kubernetes-based) infrastructure? At least, this is my understanding of the rationale for Kata containers. Other than that, it seems like it'd be just getting in the way...

basemi · on May 14, 2024

Se also: https://katacontainers.io/

bjconlan · on May 14, 2024

I believe this work originated at Intel as "clear containers" (which I believe started life from an acquisition (but could be mixing this up...my memory isn't what it used to be). Either way it's great they are being used like this and at Nvidia (I know Alibaba cloud also use this tech)

fhuici · on May 14, 2024

Yes, Kata started as clear containers. And yes, the main purpose is compatibility with containers -- though generally speaking, adding layers to the cloud stack never helps to make a deployment more efficient. On kraft.cloud we use Dockerfiles to specify app/filesystem, but then at deploy time automatically and transparently convert that to a specialized VM/unikernel for best performance.

_joel · on May 14, 2024

What, like https://firecracker-microvm.github.io/ ?

d3m0t3p · on May 14, 2024

I think it's from the AWS team, they made firecracker (micro VM) So it does exist.

Funnily that's what fly does: take your container uncompress it to a full micro VM and run it on their infra

iamstan23 · on May 14, 2024

fly.io uses Firecracker. Firecracker is Open Sourced with an Apache 2 license. It's faster than LightVM mentioned in the post.

Firecracker also has containerd support (https://github.com/firecracker-microvm/firecracker-container...).

There are a few ways to run Kubernetes with Firecracker, including FireKube.

throwaway482945 · on May 14, 2024

Is it really faster? I thought firecracker boot times were something like 100ms. LightVM claims 2.3ms?

fhuici · on May 14, 2024

Back when we did the paper, Firecracker wasn't mainstream so we ended up doing a (much hackier) version of a fast VMM by modifying's Xen's VMM; but yeah, a few millis was totally feasible back then, and still now (the evolution of that paper is Unikraft, a LF OSS project at www.unikraft.org).

(Cold) boot times are determined by a chain of components, including (1) the controller (eg, k8s/Borg), (2) the VMM (Firecracker, QEMU, Cloud Hypervisor), (3) the VM's OS (e.g., Linux, Windows, etc), (4) any initialization of processes, libs, etc and finally (5) the app itself.

With Unikraft we build extremely specialized VMs (unikernels) in order to minimize the overhead of (3) and (4). On KraftCloud, which leverages Unikraft/unikernels, we additionally use a custom controller to optimize (1) and Firecracker to optimize (2). What's left is (5), the app, which hopefully the developers can optimize if needed.

tpetry · on May 14, 2024

LightVM is stating a VM creation of 2.3ms while Firecracker states 125ms of time from VM creation to a working user space. So this comparing apples and oranges.

imtringued · on May 14, 2024

I know it's cool to talk about these insane numbers, but from what I can tell people have AWS lambdas that boot slower than this to the point where people send warmup calls just to be sure. What exactly warrants the ability to start a VM this quickly?

fhuici · on May 14, 2024

The 125ms is using Linux. Using a unikernel and tweaking Firecracker a bit (on KraftCloud) we can get, for example, 20 millis cold starts for NGINX, and have features on the way to reduce this further.

nderjung · on May 14, 2024

We ended up doing this over at https://unikraft.io :-)

andix · on May 14, 2024

I think containers are often misunderstood: The main benefit is not isolation and security, it's defined and reproducible environments and builds.

If there is some additional isolation required, just run the container in a VM.

brabel · on May 14, 2024

But if you can get isolation, security AND reproducible environments using a VM, specially one that's nearly as fast as a OS process, the case for using containers instead pretty much disappears. I don't know this LiteVM thing but I will definitely investigate that, specially given that on my Mac I need to use a VM anyway to run containers!

fhuici · on May 14, 2024

Check out kraft.cloud and the accompanying LF OSS project www.unikraft.org :) (disclaimer: I'm one of the authors of the paper and one of the people behind that cloud offering). On KraftCloud we use Dockerfiles so users can conveniently specify the app/filesystem, and then at deploy time transparently convert that to a unikernel (specialized VMs). With this in place, NGINX cold starts in 20 millis, and even heavier apps/frameworks like Spring Boot in < 300 millis (and we have a number of tech to bring these numbers even further down).

internetter · on May 14, 2024

Is the entire kraft stack open source? Can I run this on my own hardware?

Melatonic · on May 14, 2024

Also was my question.

brabel · on May 14, 2024

Looks very cool.

For anyone else wondering how heavy this is on a MacOS, I ran the install script and it just delegated to brew... brew listed the following packages being installed:

    ==> Fetching dependencies for unikraft/cli/kraftkit: aarch64-elf-binutils, gmp, mpfr, aarch64-elf-gcc, coreutils, gettext, readline, gawk, gnu-sed, pcre2, grep, make, capstone, dtc, mpdecimal, ca-certificates, openssl@3, sqlite, python@3.12, glib, libunistring, libidn2, p11-kit, libnghttp2, unbound, gnutls, jpeg-turbo, libslirp, libssh, libusb, ncurses, snappy, vde, qemu, socat, wget, x86_64-elf-binutils and x86_64-elf-gcc

Most should already exist on your mac if you do development... it seems to rely on qemu, unsurprisingly... openjdk as well (probably to support Java out-of-the-box?), imagegick etc.

Took a few minutes to finish installing... the CLI seems to be based on the Docker commands (build, clean, run, 'net create', inspect etc.), some package-manager like commands ('pkg info', 'pkg pull', 'pkg list' etc.), a bunch of "cloud" commands (I suppose that's the non-free part) and "compose" commands just like docker-compose. Interesting stuff.

Note for the parent commenter: the Lua link in the landing page is broken: https://github.com/unikraft/catalog/tree/main/examples/http-...

I tried to run the C hello world example... I get an error, it wants to run Docker?!?! I thought the whole point was to avoid Docker (and containers)??

Here's the log:

i creating ephemeral buildkit container W could not connect to BuildKit client '' is BuildKit running? W W By default, KraftKit will look for a native install which W is located at /run/buildkit/buildkit.sock. Alternatively, you W can run BuildKit in a container (recommended for macOS users) W which you can do by running: W W docker run --rm -d --name buildkit --privileged moby/buildkit:latest W export KRAFTKIT_BUILDKIT_HOST=docker-container://buildkit W W For more usage instructions visit: https://unikraft.org/buildkit W E creating buildkit container: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?: failed to create container

PS. running the hello-world pre-built "image" worked:

> kraft run unikraft.org/helloworld:latest

EDIT:

A lot of stuff looks broken on MacOS.

For example, `kraft menu` doesn't work (error "no choices provided", even though the docs show it working fine without "choices"?)...

`kraft run --elfloader loaders.unikraft.org/strace:latest ./my_binary` also doesn't work (the docs show it working). Error: "unknown flag: --elfloader".

Seems like the product is still in alpha?!

alickz · on May 14, 2024

i think at that point it's kind of just semantics

i don't think devs care if they use containers or VMs, as long as it's easy and they don't have to worry about which version of Python the host is running

jackcviers3 · on May 14, 2024

This. It's why vagrant was popular before the container revolution.

The killer app of Docker isn't the container, it's the depth and uniformity of the UX surrounding the container system. When that is broken by something on the host (non x86 cpu was a major pain for a while before popular images were x-built) and emulation gets in the way and is not as easy, or just mildly different (windows behind corporate firewalls that assign ips used by the docker engine for example), the ease of use falls away for non-power users and it's all painful again.

Tech like Docker for windows and Rancher Desktop and lima has largely matured at this point, but somebody could make a new machine and then the process of gradual improvement starts all over again.

usrusr · on May 14, 2024

Certainly depends a lot on what the term "VM" actually means in the context. If it's something as specialized as the JVM, or a native virtualization with an extremely trimmed down guest, then at some point you'll find yourself in need of something more heterogenous, e.g. running a tool on the side that does not fit the VM. Then you're back at square one, only this time with containers (or back in some irreproducible ad-hoc setup). Going with containers from the start, containers that may or may not contain a VM, and that may or may not actually do more than what a VM could supply, that's much less hassle than changing horses at a later point.

byteknight · on May 14, 2024

No it didn't. I want to rebuild my image using a different base.

Docker? One line possibly.

VM? Afternoon (unless I want to write ansible too)

pwagland · on May 14, 2024

In this sort of scenario, then you can also use something like cloud-init: https://cloudinit.readthedocs.io/en/latest/

byteknight · on May 15, 2024

Or I could use the tool I currently am. Your reply doesn't solve my use case it just says use yet another tool.

bluGill · on May 14, 2024

VMs in general use more CPU power as you have two OSes each doing things like updating their real time clock... There are VM aware OSes that will not do this, but it needs special code and CPU support which means you are often lagging behind the latest (to be fair this is rarely important) A container will normally be slightly faster than a VM never slower (assuming a reasonable OS - I can write an exception if I was malicious) and so there is a lot of interest if they are good enough.

bartekrutkowski · on May 14, 2024

You don't need a VM on Mac to run containers, check out OrbStack, they provide a Docker compatible engine that is using native MacOS capabilities for running containers without the hidden Linux VM.

pxc · on May 14, 2024

I don't know where you got that idea. OrbStack absolutely runs a Linux VM. That Linux VM then uses Linux containerization technologies (namely LXD) for each separate OrbStack 'machine' you set up, which is how you get such fast startup times for your OrbStack 'machines'.

For Docker, OrbStack does the same thing as Docker Desktop, Podman Desktop, Rancher Desktop, etc., which is set up a Linux VM running Docker and then present a native socket interface on macOS which relays everything it receives to the Docker socket inside the VM.

macOS doesn't have native capabilities for running containers, which is why the nearest thing you can get to containerd on it requires you to disable SIP so it can use a custom filesystem to emulate bind mounts/null mounts: https://darwin-containers.github.io/

If you read the PRs where the principal author of the Darwin Containers implementation is trying to upstream bits of his work, you'll see containerd comparing his approaches to others and complimenting them by calling them 'the most containerish' because real capabilities aren't there.

(I believe I've read rumors here on HN that Apple has those features internally, fwiw. But they've evidently never released them in a public copy of macOS.)

Another clue in all this is to just run uname in any of your Docker containers in OrbStack; you'll see they're Linux machines. Some operating systems have Linux syscall emulation layers (WSL1, FreeBSD's Linux emulation, Illumos' LX Zones) that could perhaps be used to run Linux containers without hardware emulation or paravirtualization in combination with some native containerization capabilities. Afaik Illumos' LX Zones is the only implementation where that's a supported, intended use case but maybe FreeBSD can do it. At any rate, macOS has never had that kind of syscall compatibility layer for Linux, either. So when you run `uname` in a 'macOS container' and see 'Linux', you can be certain that there's a VM in that stack.

PS: Aside from the fact that it's proprietary, I really do quite like OrbStack. It's the nicest-to-use implementation of something like this that I've tried, including WSL2 and Lima. The fact that it makes the VM machinery so invisible is very much to its credit from a UX perspective!

bartekrutkowski · on May 15, 2024

Interesting! I'd swear that in the early days of OrbStack somewhere on their website I've read they're using native MacOS frameworks without the need of Linux VM, but I can't find that anymore (they don't mention Linux VM either, but the language still differs from what I remember).

pxc · on May 16, 2024

They do use native GUI frameworks rather than something like Electron, which they still mention. And maybe they also used to have something about relying on Apple's Virtualization Framework or something like that, rather than qemu as Lima used for a long time. (I think it may still be Lima's default, but not for long.)

tpetry · on May 14, 2024

But then also monitoring gets harder. With containers you could see (and monitor) all those processes running in container easy on the host.

With VMs? You now need a way to get data from within the VM which in most lightweight VM implementations just not possible.

fhuici · on May 14, 2024

On kraft.cloud we have unikernels (specialized VMs) with Prometheus exporters that can be scraped, and other monitoring facilities.

andix · on May 14, 2024

Where is the tooling to build and distribute lightweight vms like containers? How can I copy one html file into an nginx VM, built this vm image with multiple architectures (I have arm and x64 servers), publish it, pull it, and run it multiple times?

Once again: Containers are not about isolation or security, they are a package format for shipping applications. The packages are easy to build, distribute, multiarch, ...

And requiring a Linux-VM on macOS to run Linux containers, is not particularly surprising.

hmottestad · on May 14, 2024

I wouldn't discount the massive user base of containers among developers.

Timshel · on May 14, 2024

At the time of publication of the article the tool used to create the minimalistic VM Tinyx was not released and as far as I can see was never released.

fhuici · on May 14, 2024

Correct, we never did release Tinyx, mostly because it was in a very unclean/researchy state = not ready for public consumption. In retrospect, we probably should have either (a) made it available in whatever state it was in or (b) put more cycles into it.

ravenstine · on May 14, 2024

I agree in principle, but this is not so easy when shared disk IO is introduced, in my experience.

beeboobaa3 · on May 14, 2024

> specially given that on my Mac I need to use a VM anyway to run containers

Take it up with Apple.

nderjung · on May 14, 2024

Containers are perfect for build environments and for creating the root filesystem. The issue is that the kernel these days are super bulky and are intended for multi-user, multi-process environments. Running a container runtime on top just makes it worse when you're looking for "isolation".

This paper argues that when you build a extremely minimal kernel (i.e. ditch Linux entirely) and link your application against necessary bits of code to execute _as_ a VM, then you'll get better performance than a container and you'll get that isolation.

This is in fact true based on performance studies, the follow up paper to this shows so: https://arxiv.org/pdf/2104.12721

(Disclosure, co-author of the linked paper.)

We ended up taking this to real workloads if you want to see it in action: https://unikraft.io/

pdimitar · on May 14, 2024

Your pricing page has a mistake in the Free tier. The number 1 which is supposed to be a superscript is instead shown as HTML markup (<sup>1</sup>).

byteknight · on May 14, 2024

Can confirm it's present on mobile.

mark_l_watson · on May 14, 2024

I am looking at the examples. They all have a Docker file. If that just for local development on my laptop?

Using the deploy command line tool is the Docker file used to determine dependencies for the hosted VM? What if a developer is using an unusual programming language, like Common Lisp. Is that doable?

rad_gruchalski · on May 14, 2024

A Dockerfile is just a file with a bunch of commands to execute and get a working "computer". https://github.com/combust-labs/firebuild is fairly aged translation of the Dockerfile to a VM rootfs.

posix_monad · on May 14, 2024

> build a extremely minimal kernel (i.e. ditch Linux entirely) and link your application against necessary bits of code

It would be nice, but this is really hard to do when modern software has so many layers of crud. Good luck getting say, a PyTorch app, to work doing this without some serious time investment.

jerf · on May 14, 2024

But you don't need to write against all the layers of crud. You only have to write against the bottom layer, the kernel API. This sort of software would have no need to specifically support "libxml" or "TLS", because that is multiple layers above what this sort of software does.

The flip side is that if you want something like low-level access to your specific graphics card you may need to implement a lot of additional support. But of course nothing says you have to use this everywhere at the exclusion of everything else. There's plenty of systems in the world that from the kernel point of view are basically "I need TCP" and a whole bunch of compute and nothing else terribly special.

fhuici · on May 14, 2024

[Author of the paper here] You hit the nail on the head, this is precisely what we do (kernel API compatibility) with the LF Unikraft project (the evolution of the 2017 paper) at www.unikraft.org, and kraft.cloud, a cloud platform that leverages Unikraft.

bluGill · on May 14, 2024

Most of that effort should be sharable. if you know you will only have one python process you can get rid of a lot of cruft. If you know you will be running in a VM then you only need the driver for the network interface the VM provides not every network interface every designed (often including ones that your hardware doesn't even physically support). So while there is serious time investment it isn't nearly as much as it would be to write a competitor to linux.

ascar · on May 14, 2024

I'm not sure if I missed a bit here, but I have some colleagues doing research on unikernels for HPC and the point is that this unikernel is running directly on the hardware or hypervisor and not inside another VM. The unikernel is effectively a minimal VM and the network stack is one of the things they struggle the most with due to sheer effort.

fhuici · on May 14, 2024

[One of the authors of the paper] I wouldn't recommend writing a network stack from scratch, that is a lot of effort. Instead, with the Unikraft LF project (www.unikraft.org) we took the lwip network stack and turned it into a Unikraft lib/module. At KraftCloud we also have a port of the FreeBSD stack.

melenaboija · on May 14, 2024

> Running a container runtime on top just makes it worse when you're looking for "isolation".

The point of the poster was pretty clear:

“The main benefit is not isolation and security”

cduzz · on May 14, 2024

I tell people "An OCI container is a way to turn any random runtime into a statically linked binary."

It is very useful for managing dependency hell, or at least moving it into "API dependencies" not "Library dependencies", it is handy for pickling a CI/CD release engineering infrastructure.

It's not a security boundary.

(I'm 100% agreeing with parent, in case I sound contentious)

scarby2 · on May 14, 2024

> It's not a security boundary.

It is a security boundary, just not necessarily the best one.

andix · on May 14, 2024

I would even claim it's a pretty good security boundary, good enough for most applications.

bayindirh · on May 14, 2024

It's an incidental security boundary because CGroups happen to isolate the process fairly well.

ajross · on May 14, 2024

All security boundaries are "incidental" in that sense, though. Virtualization isn't a "purpose-designed" security boundary either, most of the time it's deployed for non-security reasons and the original motivation was software compatibility management.

The snobbery deployed in this "containers vs. VMs" argument really gets out of hand sometimes. Especially since it's almost never deployed symmetrically. Would you make the same argument against using a BSD jail? Do you refuse to run your services in a separate UID because it's not as secure as a container (or jail, or VM)? Of course not. Pick the tools that match the problem, don't be a zealot.

bayindirh · on May 14, 2024

> All security boundaries are "incidental" in that sense, though

X86 protected mode, processor rings, user isolation in the multi user operating systems, secure execution environments in X86 and ARM ISAs, kernel and userspace isolation, etc. are purpose built security boundaries.

Virtualization is actually built to allow better utilization of servers, which is built as a "nested protected mode", but had great overhead in the beginning, which has been reduced over generations. Containers are just BSD jails, ported to Linux. This doesn't make containers bad, however. They're a cool tech, but held very wrong in some cases because of laziness.

ajross · on May 14, 2024

The motivation for MMU hardware was reliability and not "security". Basically no one was thinking about computer crime in the 1970's. They were trying to keep timesharing systems running without constant operator intervention.

vilunov · on May 14, 2024

Yeah, but that's not an incidental property of *namespaces* (of which cgroups is only one isolation axis), that was the requirement when namespaces were designed.

bayindirh · on May 14, 2024

Yeah, I know. Namespaces are pretty cool outside containers too.

My comment was more of a soft jab against using containers as the ultimate "thing" for anything and everything. I prefer to use them as "statically linked binaries" for short lived processes (like document building, etc.).

But, whenever someone abuses containers (like adding an HTTPs fronting container in front of anything which can handle HTTPS on its own) I'm displeased.

Relevant XKCD: https://xkcd.com/1988/

nolist_policy · on May 14, 2024

Depending on your container runtime (Kata container or gVisor), it's a exceptionally strong security boundary too.

bluGill · on May 14, 2024

There is no such thing as a reproducible build environment anymore. You can get a temporary reproducible build environment, but any sane security policy will have certificates that expire and that in turn means that in a couple years your build environment won't be reproducible anymore.

cesarb · on May 14, 2024

> but any sane security policy will have certificates that expire and that in turn means that in a couple years your build environment won't be reproducible anymore.

"Reproducible" is usually defined as "identical output except for the cryptographic signature at the end" (and that should be the only use for a certificate in your build environment, a high-quality build environment should be self-contained and have no network access). That is, once you remove the signature, the built artifacts should be bit-by-bit identical.

bluGill · on May 14, 2024

I said environment not build.

andix · on May 14, 2024

If you run multiple instances of a container image, you get a reproducible environment.

If you run a docker build multiple times, and copy a few files into the container, you get a reproducible container image. It is not a hash perfect duplicate, but functionally equivalent.

If builds of your favourite programming language are reproducible or not, is not really related to VM vs. Container.

flanked-evergl · on May 14, 2024

Here are some projects that run docker containers on top of micro/lightweight VMs:

- https://github.com/firecracker-microvm/firecracker-container...

- https://github.com/kuasar-io/kuasar

- https://github.com/kata-containers/kata-containers

- https://github.com/QuarkContainer/Quark

- https://github.com/google/gvisor

- https://github.com/containers/libkrun

lttlrck · on May 14, 2024

The main advantage in my use case is in fact isolation (network and volumes) and a well defined API enabling management of those containers in production (not k8s, a tiny subnet of that perhaps).

The isolation could be achieved using namespaces directly. But the API, tooling and registry add a lot of value that would otherwise require a lot of development.

Also last time I looked hypervisors aren't possible on all cloud vendors, unless you have a bare metal server. This matters in my case. Maybe it has changed in the past 3 years.

When docker fits it's great. Same can be said of k8s, where there are a whole bunch of additional benefits.

Swings and roundabouts.

dweekly · on May 14, 2024

If this were true, then wouldn't folks just need an application binary that statically links all of its required libraries and resources into a giant, say, ELF? Why even bother with a container?

Repulsion9513 · on May 14, 2024

Programmers discover the benefits of static linking, and then programmers discover the benefits of dynamic linking, and then programmers discover the benefits of static linking, and then...

Anyway containers go quite a bit further than just static linking, most people aren't out there linking all the binaries that their shell script uses together?

andix · on May 14, 2024

What if you application is not just one binary. What if it's a pipeline of complex tasks, calls some python scripts, uses a patched version of some obscure library, ...

It's not possible to package half a Linux distribution into a single binary. That's why we have containers.

TDiblik · on May 14, 2024

First thing that comes to mind is the need to link against libraries across platforms. Imagine that my app depends on opencv, if I wanted to statically link everything on my Windows machine, I need to compile opencv for Linux on my windows machine (or use pre-compiled binaries). Also, if you link against libraries dynamicaly, it's likely you can compile them on the host machine (or in a container) with more optimizations enabled. And the last thing is probably the ability to "freeze" the whole "system" environment (like folders, permissions, versions of system libraries).

Personally, I use containers to quickly spin-up different database servers for development or as an easy way of deployment to a cloud service...

jcelerier · on May 14, 2024

Well yes, but try turning some random python, java or ruby service into a single binary .. now do that 12 times. Or try with a native app that leverages both the GPU and libLLVM, and enjoy finding out the kind of precautions you have to take for LLVM to not blow up on a computer where your GPU driver was built with a different LLVM version.

Cthulhu_ · on May 14, 2024

That's exactly the summary of this.

That said, it makes sense from a developer POV; if, during development, you don't need the isolation you can run multiple containers (with on paper fast boot times and minimal overhead) on your development box.

There's plenty of cases to imagine where you need the containerization but not necessarily the isolation.

jbverschoor · on May 14, 2024

That what I do. I use https://github.com/jrz/container-shell so my projects are contained, and I'm a little bit protected against supply chain attacks

DEADMINCE · on May 20, 2024

Why not use bubblelwrap or selinux or something? Docker is overkill for this.

jbverschoor · on May 21, 2024

Haven’t heard about bubblewrap, and I’m on my macOS. But I’ll look into it for inspiration

dailykoder · on May 14, 2024

Because static libraries ain't a big thing anymore. Maybe they will become popular again. This would make it easier to have reproduceable build without a container. But I think containers are the new static libs now

indymike · on May 14, 2024

Interpreted languages and their associated dependencies are more of an issue than static linking with compiled languages.

shawabawa3 · on May 14, 2024

people would absolutely love to do that, but it's difficult, and the UX sucks

so people use containers instead (even if the container literally only contains a single statically linked binary)

otabdeveloper4 · on May 14, 2024

Yes, indeed, and that is golang's main reason for existing.

dooglius · on May 14, 2024

glibc doesn't work statically linked and lots of stuff depends on glibc

nimbius · on May 14, 2024

we arguably already had this with things like python venv.

the articles main point still remains, containers are a slow and bloated answer to this problem.

I concede youll need containers for Kubernetes, and Kubernetes on the surface is a very good idea, but this level of infrastructure automation exists already in things like foreman and openstack. designs like shift-on-stack trade simplicity of traditional hardware for ever byzantine levels of brittle versioned complexity...so ultimately instead of fixing the problem we invoke the god of immutability, destroy and rebuild, and hope the problem fixes itself somehow...its really quite comical.

baremetal rust/python/go with good architecture and CI will absolutely crush container workloads in a fraction of disk, CPU, RAM, and personal frustration.

jchw · on May 14, 2024

Python venv is language specific, doesn't handle the interpreter version and doesn't handle C libraries.

I really don't understand why people do this: I get having a distaste for containers but some people, seeing the massive success of OCI images, mainly seem content on trying to figure out how to discredit its popularity, rather than trying to understand why it's popular. The former may be good for contrarian Internet forums, but the latter is more practically useful and interesting.

I say this with some level of understanding as I also have a distaste for containers and Docker is not my preferred way to do "hermetic" or "reproducible" (I am a huge Nix proponent.) I want to get past the "actually it was clearly useless from the start" because it wasn't...

jbverschoor · on May 14, 2024

Not really, you'd still need a proper chroot / etc.

Check out https://github.com/jrz/container-shell

benreesman · on May 14, 2024

Containers have their place, if you’re racking and running your own gear life gets a lot better than screwing around with IPMI.

But reproducible infrastructure as code is just orthogonal to that: everything from Salt to Nix is credible in that role.

Containerizing on top of a Xen hypervisor never made sense to me.

oceanplexian · on May 14, 2024

All the younger engineers I talk to think you would need to be Albert Einstein to bootstrap a bare metal server.

As someone who made a living doing this at scale, where we would build a new datacenter every 2-4 weeks using 100% open source or off the shelf tools, I completely disagree.

I think PXE booting some servers and running a binary on them is 90% easier than most container orchestration engines, Kubernetes control plane, and all the other problems engineers seem to have invented for themselves. I also think it’s almost always much more performant. Engineers don’t have an intuition to realize that their XXLarge-SuperDuper instance is actually a 5 year old Xeon they’re sharing with 4 other customers. Cloud Prociders obfuscate this as much as possible, and charge a King’s ransom if you want modern, dedicated hardware.

benreesman · on May 14, 2024

I’ll urge that we don’t compare Albert Einstein to a k9s expert.

No one will ever be considered for a Nobel over Kubernetes.

kkfx · on May 14, 2024

NixOS and Guix System offer a far lighter and more reproducible approach, who also not push many running images build by unknown on the internet direct in production, full of outdated deps, wasting in the meantime storage and cpu resources...

oneplane · on May 14, 2024

Yet it doesn't even come close to a fraction of the adoption scale of containers, no matter how good it is. Ecosystems matter more than individual quality.

kkfx · on May 14, 2024

That's because some interested parties have advertised containers, because they are good to sell as pre-built stuff, nice to sell VPS and alike etc, while pure IaC is useful for anyone and invite NOT to be dependent on third party platforms.

It's not a technical matter, it's a human, economical matter and actually... Most people are poor, following the largest scale means following poverty not a good thing.

bigstrat2003 · on May 14, 2024

I think that's a gross overstatement. Ecosystems matter, yes. But they don't matter more than quality.

oneplane · on May 17, 2024

I think Betamax might have something to say about that.

fhuici · on May 14, 2024

[disclaimer: I'm one of the authors of the paper] I 100% agree, containers are an amazing dev env/reprodicble env tool! In fact, we think they're the perfect marriage to the unikernels (specialized VMs) we used in the paper; on kraft.cloud , a cloud platform we built, we use Dockerfiles to specify apps/filesystems, and transparently convert them to unikernels for deployment. The end result is the convenience of containers with the power of unikernels (eg, millisecond cold starts, scale to zero and autoscale, reduced TCB, etc).

andrewpolidori · on May 14, 2024

While reproducible build envs are a nice feature of using containers, they aren't the primary benefit.

The primary benefit is resource usage and orchestration.

Rather than duplicating entire aspects of an OS stack (which might ne considered wasteful) they allow for workloads to share aspects of the system they run on while maintaining a kind of logical isolation.

This allows for more densely packed workloads and more effective use of resources. This is a reason why the tech was developed and pushed by google and adopted by hyperscalers.

belter · on May 14, 2024

You are absolutely correct, and the creators of Docker did mention that was the core reason. Unfortunately your comment comes 10 years too late for many.

BobbyTables2 · on May 14, 2024

I agree with you, but the world seems to think otherwise!

nailer · on May 14, 2024

> If there is some additional isolation required, just run the container in a VM.

No. Running a container in a VM gets you no additional isolation. Containers share kernel space and as such have limited isolation to VMs, which have isolated kennels. In exchange for this Lack of additional isolation, you’ve added a Bunch of extra Complexity.

Pardon the extra caps I am using iOS voice dictation.

kevincox · on May 14, 2024

I think they mean run a VM with one container inside. So you do get strong isolation.

This is similar to how managed container IaaS works. They launch a VM and run your container in it.

It is extra complexity but has a few advantages. 1. People already have a convenient workflow for building container images. 2. The base OS can manage hardware, networking and whatever other low-level needs so that the container doesn't need to have these configurations. 3. If you want to trade of isolation for efficiency you can do this. For example running two instances of a container in the same VM. The container doesn't need any changes to support this setup.

fhuici · on May 14, 2024

The model of a single container within a VM just adds overhead. The ideal case would be to remove the container layer and have the application(s) within the container run directly in the VM (which hopefully only includes the libs and OS modules needed for the app to run, and nothing more).

This is the approach we take at kraft.cloud (based on the LF Unikraft project): use Dockerfiles to specify app/filesystem, and at deploy automatically convert to a lightweight VM (unikernel) without the container runtime/layer.

nailer · on May 14, 2024

> So you do get strong isolation.

No, you don’t. There is no benefit the container is providing, because The only feature of the container is isolating you from the zero other containers running on the VM.

kevincox · on May 14, 2024

The isolation I am referencing is from the VM, not the container. Containers don't provide strong isolation, that is why the VM is required in this model.

nailer · on May 15, 2024

The VM is an isolated environment itself. You do not need to be isolated from it.

Using two levels of userland isolation makes about the same sense as using 457 levels of userland isolation.