Hacker News new | ask | show | jobs
by ykumar6 3900 days ago
Abstractions are important as they reduce complexity, and simplify operations.

The foundation of computing is based off abstraction - using layers and interfaces to hide complexity, so developers can focus on higher-order problems.

This article argues against abstractions citing security, performance and cost. But time and again it has been shown that most costly component in software is human time - and simplifying the underlying architecture is worth the trade-offs

2 comments

Abstractions are important as they reduce complexity, and simplify operations.

...but only when used correctly. Abstraction is a means to an end, not the end itself. Unfortunately, years of CS education seem to have taught most people that it's the other way around, causing massive increases in design complexity that are only justified by dogmatic adherence to "more abstraction is better". Some programming language communities are more disposed to this effect than others (e.g. Java.)

Correct application of abstraction is not common in mainstream software, and rather difficult to describe, but the simplicity and clarity is unmistakable when one encounters it. The occasional articles on HN about seemingly impossibly tiny programs are good examples.

This article is a bit "X considered harmful" reactionary but I see their point - often, software today is on the side of far too much abstraction.

" Unfortunately, years of CS education seem to have taught most people that it's the other way around, causing massive increases in design complexity that are only justified by dogmatic adherence to "more abstraction is better"."

Abstractions only create design complexity when they are applied incorrectly. Abstractions should scale horizontally across a layer of the software stack (VMs, Storage - NFS, APIs, etc). If you're create a single-use abstraction, it's not really an abstraction but a complexity

NFS is pretty much a canonical example of a poor abstraction.
The real problem with abstractions it promotes the lowest common denominator. If you have lots of different storage options, an abstraction will try to abstract them and create a standardized interface.

But this is a trade-off. The real benefit comes in simplifying the programming model and not forcing developers to read through manuals figuring out how to flip a bit on a hard drive. Instead, they can leverage open-source and libraries that rely on that standardization to deliver most of the value (with a small perf hit)

"It is important to note that we favor abstractions, but they should be implemented outside the operating system so that applications can select among a myriad of implementations or, if necessary, roll their own ."
What is an operating system if not an abstraction of the hardware?
The idea is to multiplex, not abstract. To illustrate the difference, say you have an OS that runs only applications written in JVM bytecode. This is an abstraction: the OS is providing a different interface (bytecode) than the actual hardware interface (machine code).

Most OSes don't do anything like this. They allow applications to be written in raw machine code. The applications run as if they had full control over the CPU hardware. The OS then multiplexes the CPU by saving and restoring the program counter (and, typically, the rest of the registers and CPU state as well, though the exokernel design in this paper doesn't even do that). The idea is to (as much as possible) provide the same interface as the underlying hardware provides, then do a little extra work to make sure different applications aren't stepping on each other's toes.

It's more like the CPU has a multiplexing interface and the OS is managing that. Applications are not run in the same environment as the OS is.
I think they're arguing that the OS should be the minimum software representation of the hardware necessary for "secure multiplexing", and any abstractions/layers after that should be per-application as needed (and only as needed).

Certainly you can call that minimal representation an abstraction as well, but I'm not sure it's helpful in this context, since it seems clear enough what they're arguing against.

I came to this thread to argue exactly this - pleased you did it for me! My issue is 1) They do not acknowledge the need for this "initial minimal abstraction" and 2) I'm not so sure that it would be so minimal.

The issue comes when you try to define "safe multiplexing". Take for instance a spinning disk drive. If we took this at face value, every application would know about things like sectors and seek times. Presumably this would permit some sort of domain-specific optimisation (that, say, a database engine might use). Perhaps we posit that programs that don't need such specialisation use a library for disk access. So far so good.

Now what is it the OS is trying to multiplex? No longer abstract, high-level concepts like "write this data to this file", which it can safely mess about with because it knows what they mean; no, it has to multiplex read head seeks. It cannot have any awareness of the meaning of these seeks (that was the point of the exercise!) so it can't really be more intelligent than a "dumb multiplexer". So your finely tuned database application has its clever read head optimizations all shot to hell whenever literally anything else touches the disk.

In order for an OS to multiplex hardware resources efficiently, it needs to have some idea of what the applications are trying to accomplish, so it stands the best chance of giving it to them.

For what it's worth, I also find the paper rather hot headed and light on concrete examples.

The authors went on to implement a couple systems that handle that problem rather nicely. Applications (really, the libraries they use) do know about disk sectors, but the kernel's disk driver sorts their requests to optimizes seeks and exposes which sectors are loaded, to allow a kind of cooperative disk cache.

Even more interesting, they let applications share file systems by taking a bytecode-based representation of FS metadata from userspace, and using that to enforce correct usage of the actual disk blocks. This lets applications control where on the disk to allocate, when to read which blocks, etc. without losing any of the security and cooperation of a typical file system.

Secure multiplexing, VM's, and kernels were repeatedly done back in 80's and 90's under the Computer Security Initiative. See p5 on this one for an example where trusted functions efficiently did I/O multiplexing requests (syscalls) from untrusted drivers in guest OS's:

http://www.cse.psu.edu/~trj1/cse543-f06/papers/vax_vmm.pdf

You can ignore the security kernel and MLS stuff while imagining something simpler there. However, the design and assurance strategies for that one have yet to be topped by modern virtualization products.

Here's a modern approach to secure I/O with a nice list of others in Related Work:

http://repository.cmu.edu/cgi/viewcontent.cgi?article=1328&c...

Have fun with those.

I think most applications would use a shared file system, just as they do today, including all the same optimisations. But your high performance database would likely be given its own disk to work with. (Just as you would today, but the benefits of giving a whole disk to a process in exokernel land are theoretically greater.)
They are making 2 separate arguments then. What they are arguing against is standardization, which is just as important
Addressed in Section 4, Question 3.
The question remains if companies and developers value or care about the flexibility of creating "page table rules".

For highly scalable systems, the perf trade-off is just a matter of spinning up more VMs.

The higher-order benefit is you can expect your operating system and VM to behave the same, no matter what

Spinning up more VMs costs more. That was a bigger issue in 1996 than now, but still an issue for large deployments.