We Are Really Unhappy with Our Operating Systems, and Don't Know It

Linux has won. It is taking over everything, from tiny devices to the biggest super-computers. Apple’s operating systems are all pretty much on the same model, and Microsoft always seems to be trotting along in roughly this direction, too.

The idea is pretty cool: Give each program a uniform view of the machine, keep them from interfering with each other. Not only can each program mostly pretend it owns the entire machine, the model is good enough to be extended to multiple users, all running on the same machine.

Yes, there will be resource limits with all this sharing going on, but that is a necessary limitation, the larger sharing model is great.

So why are we so unhappy with it? Why do we have this big virtualization fad? The operating system was supposed to let multiple users share the same physical machine, why an extra layer of multiple operating systems sharing the same hardware? If these multiple operating systems were different kinds of operating systems (needed to be compatible with different kinds of programs) that would make sense, but mostly we run multiple virtual copies of the same operating system. Frequently the same version of the same operating system. The popularity of hypervisors for providing multiple uniform views of the hardware, keeping them from interfering with each other, seems a big indictment of what the OS was supposed to do. Something is wrong with the API offered by the OS if we prefer the API offered by BIOS. Something is wrong.

And inside the OS, different programs were supposed to do the different things. So why are we now inventing enormous container facilities like Docker and Kubernetes for supplying the features we want? Isn’t that what the OS was supposed to orchestrate?

I don’t see much questioning of the role of the OS, but I see an awful lot of ad hoc reinventing of OS-like services.

Part of this is clearly a limitation of the OS model: Individual programs are isolated from each other, but it seems not isolated enough, we want more isolation, so we fire up new OS instances. Also, individual programs have complicated and conflicting dependencies to shared libraries that the old OS model isn’t good at mediating. Finally, individual programs are not where the action is, we run different programs in concert with both dependency confusions between them, and contradicting desires to be isolated from other programs (so they don’t interfere) but not isolated from other programs (so they can cooperate). It seems these are all issues the OS should handle, and it doesn’t, that’s why we have so many VMs, and these container facilities.

Recently I ran across the various name-spaces that the Linux kernel offers. (Linus is very pedantic that the kernel just be the kernel, but that doesn’t mean it isn’t still freaking gigantic and bursting with features.) These name spaces provide a lot of granularity for controlling what is isolated and what is shared between different programs. It seems they make it possible to completely isolate software, as if you were running completely different operating systems. I say “it seems” because I don’t know that I am right, I don’t know that these different name spaces cover all the bases. And, even if I knew they did cover all the bases, how would anyone ever trust that they did it in a bugfree way? How would anyone ever know that there isn’t some unintended leaking between spaces, security holes hidden in the confusion.

I think this gets to the point: The confusion. The old OS model was simple, that was a virtue. The model implicit in Linux name spaces is so complicated that I almost don’t want to call it a model: if almost no one can understand the implications of all those features, can it be called a “model”? Does it instead become an “artifact”? Something to be studied, as opposed to a model, something clear enough to be understood?

Maybe I am just being overwhelmed and demonstrating my ignorance. But something tells me that the simplicity of hypervisors, presenting a near bare-metal model, isn’t about to lose its appeal as everybody starts to grok Linux name spaces.

I think we are choking on unmanaged complexity, that we are building systems that are more complicated than we know, that not only are they riddled with conventional bugs, but attackers are waltzing though our systems via the security holes made possible by that complexity. But that’s another topic.

My conclusion here is we have run out the old OS model to the point of absurdity, that we need to rethink what abstractions an OS should offer. The old OS model was both powerful and simple, but look at the layers of baroque filigree we are accumulating, it is time to revisit our assumptions about what an OS is.

-kb