Hacker News new | ask | show | jobs
by tsuraan 2559 days ago
OP is somewhat conflating two different things: non-strict function evaluation and lazy IO. With lazy IO, you can get, for example, a String from a file. That string is actually a lazily-constructed chain of Cons cells, so if you're into following linked lists and processing files one char at a time, then it's fun to use. The dangerous bit comes in when you close the file after evaluating its contents as a string:

    fd <- open "/some/path"
    s <- readContentsLazy fd
    close fd
    pure $ processString s
Now, processString is getting a string with the file's contents, right? Nope, you have a cons cell that probably contains the first character of the file, and maybe even a few more up to the first page that got read from disk, but eventually as you're processing that string, you'll hit a point where your pure string processing actually tries to do IO on the file that isn't open anymore, and your perfect sane and pure string processing code will throw an exception. So, that's gross.

That's a real issue that will hit beginners. There's been a lot of work done to make ergonomic and performant libraries that handle this without issues; I think that right now pipes[0] and conduit[1] are the big ones, but it's a space that people like to play with.

[0] - https://hackage.haskell.org/package/pipes [1] - https://github.com/snoyberg/conduit

2 comments

Seems like the problem is that file close is strict, whereas the file handle should be a locked resource that is auto-closed when the last reference is destroyed (and “fclose” just releases the habdle’s own lock).

In other words the problem seems to be (in this example) that the standard library mixes lazy and strict semantics. A better library wouldn’t carry that flaw.

So that's actually how it works if you just ignore hClose. The problem is that it sometimes matters when things get closed, so they do "need" to expose the ability to close things sooner.
Sort of. It eventually gets cleaned up by the garbage collector, yes. But that could be after an indeterministic amount of time if the GC is mark-and-sweep. My point is that in this circumstance reference counting could be used regardless so that as soon as the last thunk is read, the file is closed. The 'hClose' is basically making a promise to close the file as soon as it is safe to do so.
> as soon as the last thunk is read, the file is closed.

That's probably doable. It's true that when the only reference to the handle in question is the one buried in the thunk pointed at by the lazy input, it should be safe to close it when a thunk evaluates to end-of-input (or an error, for that matter).

I'm not sure whether or not it'd be applicable enough to be worth doing. The immediate issues I spot are that a lot of input streams aren't consumed all the way to the end, and that you'd have to be careful not to capture a reference anywhere else (or you'll be waiting for GC to remove that reference before the count falls to zero).

Also things like unix pipes or network sockets, where the "close" operation means something different as there are multiple parties involved. Arguably the same is true of files as you could be reading a file being simultaneously written to by others.
Right. It's easy to handle the simple case, but honestly "let the GC close it" works fine in the simplest cases.
Why is it possible to "close" a file? You could have a function that mapped open files to closed files, but the open files would still be there... I think the reason why this weird behavior is cropping up is that the entire language is designed around functions, and here you are reaching into the internal datastructures, mutating state.
> Why is it possible to "close" a file?

Because the program we're compiling needs to work on actual computers, running under actual (usually at least vaguely POSIX) operating systems. In that context, it's unavoidable that the set of open file descriptors sometimes matters. It can matter because of resource limits. It can also change whether another process gets an SIGPIPE versus blocking forever. It can affect locking.

> Why is it possible to "close" a file?

i guess i would ask why it's possible to close a file that's going to be used after it's closed? will linear types[1] solve this?

[1] https://gitlab.haskell.org/ghc/ghc/wikis/linear-types

> i guess i would ask why it's possible to close a file that's going to be used after it's closed?

I don't think there's much reason to want to do it, but it's not obvious how to enforce that while still retaining the flexibility we'd want.

Linear types expand the solution space, to be sure. Whether they "solve this" depends a bit on exactly what we consider the problem to be.

The file isn't going to be used after it's closed. The string from the file is going to be used after the file is closed. But with lazy IO, you don't have (all of) the string from the file yet, even though you've "read" it.

That is, the abstractions don't do what non-Haskell abstractions would lead you to expect.

Right. The whole point of lazy IO is that you hide the actual IO behind values that don't appear to be IO. That means your use of the file isn't visible to the type system, so it's not really reasonable to expect it to prevent it. Unless I miss something, you can't write lazy IO without lying to the type system anyway (unsafePerformIO).