Hacker News new | ask | show | jobs
by mbreese 2580 days ago
I think it might be more helpful o think of what semantics could be supported by the everything is a file operations. A tape drive rewind() is just a specialized version of seek(), which any random access object would need to support.

The file metaphor is soooo flexible, so it’s hard for me to think of examples where it breaks down. So, what are some good examples where the file metaphor breaks down? Maybe that’s helpful?

2 comments

The trouble is, a tape drive can support seek(). But can it support it performantly for all arguments? seek(0, SEEK_SET) is easy. seek(1024, SEEK_CUR) is easy - just read forward a little. But seeking to some arbitrary fixed offset? As far as I'm aware, tape drives are designed to use filemarks for 'searching', not precise offsets.

Of course seek(n, SEEK_SET) could be implemented anyway, in a very un-performant (and tape-wearing manner): by rewinding, and then reading forward n bytes. There's a question of whether the utility of this is desirable when weighed against how surprising it may be to people who don't realise just how bad the performance will be, especially when a tape drive which only supports seek(0, SEEK_SET) can easily have this behaviour emulated on top in userspace by seek(0, SEEK_SET) followed by dummy reads, if you really want it.

read() and write() and seek() prove remarkably versatile, but the niggles come with the fact that different types of file/device on POSIX can have subtly different gotchas with these verbs which, on the face of it, appear to be the same verb. Essentially, I might argue they're not the same verb at all - they just seem similar.

For example, read() from a UDP socket and read() from a normal file have extremely different semantics. If you read() with a 64 byte buffer from a UDP socket, the message is truncated and the remainder of it is lost. This is a very different semantic to reading from a file, where you can read in whatever chunks you like.

I wrote the article upon reflection of precisely this attempt to force everything into the straightjacket of everything-is-a-file that we've had for decades with UNIX. How much code correctly deals with short write()s? "Everything can be expressed as an object on which you can perform read()/write()" can only be true if you ignore the details of a verb's precise semantics, but the precise semantics are important. I think it's fair to argue that write() isn't one verb at this point, but an overloaded verb referring to a set of verbs. Which verb in that set you're invoking is dependent on the type of "file".

The problem, IMO, is less that the file metaphor is not capable of expressing some things and rather that it expresses some things very poorly.

For example, a GPU device doesn't have like a file. You cannot effectively control a GPU via read/write. read/write are excessively slow for anything you'd want to do, including a simple VGA buffer. Almost all operations on a GPU in Linux involve mmap'ing it and then applying ioctl() liberally.

You can do almost everything using the file metaphor, Plan 9 proves it. But it's at times going to be a very poor metaphor that is better at working at all than working well.