Hacker News new | ask | show | jobs
by mbostock 4830 days ago
Not to focus too myopically on the given example, but I can’t help but wonder why it’s a requirement that the first file be handled specially? A less contrived example would make the argument more convincing.

If I wanted to compute the size of one file relative to a set, I’d probably do something like this:

  queue()
      .defer(fs.stat, "file1.txt")
      .defer(fs.stat, "file2.txt")
      .defer(fs.stat, "file3.txt")
      .awaitAll(function(error, stats) {
        if (error) throw error;
        console.log(stats[0].size / stats.reduce(function(p, v) { return p + v.size; }, 0));
      });
Or, if you prefer a list:

  var q = queue();
  files.forEach(function(f) { q.defer(fs.stat, f); });
  q.awaitAll(…); // as before
This uses my (shameless plug) queue-async module, 419 bytes minified and gzipped: https://github.com/mbostock/queue

A related question is whether you actually want to parallelize access to the file system. Stat'ing might be okay, but reading files in parallel would presumably be slower since you'd be jumping around on disk. (Although, with SSDs, YMMV.) A nice aspect of queue-async is that you can specify the parallelism in the queue constructor, so if you only want one task at a time, it’s as simple as queue(1) rather than queue(). This is not a data dependency, but an optimization based on the characteristics of the underlying system.

Anyway, I actually like promises in theory. I just feel like they might be a bit heavy-weight and a lot of API surface area to solve this particular problem. (For that matter, I created queue-async because I wanted something even more minimal than Caolan’s async, and to avoid code transpilation as with Tame.) Callbacks are surely the minimalist solution for serialized asynchronous tasks, and for managing parallelization, I like being able to exercise my preference.

5 comments

>> but reading files in parallel would presumably be slower since you'd be jumping around on disk

There is a lot of engineering that goes into making parallel reads go fast. Some combination of the file system and disk controller will probably be smart enough to recognize the opportunity for sequential reads and execute them as such if possible.

This is not always true, and it does not undermine the rest of what you have written. I just think it's interesting to keep in mind that operating systems implement a lot of helpful machinery that user-level programmers forgot about.

Stat just reads metadata that is (hopefully) cached in memory. Linux does not have any async APIs for reading metadata anyway. Examples are always a bit contrived, it doesn't matter.

    do f1 <- fsStat "file1.txt"
       f2 <- fsStat "file2.txt"
       f3 <- fsStat "file3.txt"
       let ratio = (size f1) / (sum $ map size [f1, f2, f3])
       print ratio 
Or, if you prefer a list

    do fs <- mapM fsStat files
       let ratio = (size . head $ fs) / (sum . map size $ fs)
       print ratio
And that seems to be one small example of why you may have already invented monads. I've been loving the impact of Javascript—modify and immediately see it on the browser—but every time I'm not using Haskell I miss it dearly.
Have to admit that has also been my reaction to some of these javascript async frameworks based on promises or deferreds. Congratulations, you've reimplemented a quirky ad-hoc variant on the continuation and error monads.

Perhaps people don't spot the link as easily because monads are usually explained in terms of a type system, and javascript is untyped? (Or perhaps just because Monad is a very abstract abstraction :)

Regarding your latter point, I don't think monads are even that abstract. "Monad" just happens to be something out of category theory so it has a mathematical weird-sounding name that you MIGHT guess has something to do with monoids -- if you know what monoids are -- so people think it has to be something complicated when in practice it's just a nice unified interface for glue code.
There's the async component as well, but I think `ErrorT e Par a` covers it.
Please use more readable names in your code. Use of names like 'fs' in key places makes it unreadable.
Generally this is of course good advice, but in Haskell it is common practice to use very short names (e.g. x, x', xs, ...) if the context is clear (which it usually is due to small scope, clear function names, type signature, etc.). This makes code much more concise and readable (it also makes it look very "mathematical").
> (it also makes it look very "mathematical")

Has that ever been an advantage?

For people that like math, sure, why not. When implementing mathematical concepts, if you squint at Haskell code you can see the original formulas, which should make it easier for people used to this way of thinking.

EDIT:

I'm not implying it's useful just for programming "math stuff", after all, everything can be reduced to a mathematical problem - including game engines[1], web application frameworks[2], etc.

[1] http://www.cse.unsw.edu.au/~pls/thesis/munc-thesis.pdf [2] https://github.com/yesodweb/yesod

And it's probably one of the most significant things limiting adoption of Haskell.
The convention for Haskell is to keep the active scope of variables very small. Any variable with an active scope of larger than maybe 3 lines, I make longer. Since these examples were hardly longer than that, I feel quite justified with short names.

In Haskell, if you see a short name, look up and down 3 lines for the definition. If you can't find it then complain.

I think you are right, and it is a convention in the functional world. People are using (and worse, reusing in a close proximity!) meaningless names like that. And I think these people have zero regard to anyone who is reading their code.

Well... more power to python, and culture that embraces 'what your see is what your get' and super-readable code.

There is actually an interesting technical reason for having short names in generic Haskell functions. Because of parametricity, the behavior of the function doesn't depend on what the values actually are. The shortness of the names really is meant to convey "don't think about what this is doing, because it's not important for this function". In the traditional example for map,

  map f [] = []
  map f (x:xs) = f x : map f xs
You're supposed to infer from the short function names that f and x could be anything. The only important bit is that you can apply one argument to f (so, for example, f could take two parameters, and then map is just doing a single partial application). In that context, x and xs is actually a better convention than "first" and "rest", because they indicate the adherence to the type system. The naming here is saying that x is of the type of elements of xs, and that this is the only important information for map. This seriously helps in more complicated functions like zip, etc.
I'm not so sure that briefness and adherence to that convention improves readability. Of course f, x, xs is much much better than 'first' and 'rest', or 'a', 'b', 'c', but something like 'func' and 'iterable' gives more context. And frees one's attention to more important things, than looking up and down the code.

Compare:

    map f [] = []
    map f (x:xs) = f x : map f xs
With:

    map f xs = [f x | x <- xs]
Or even better, in Python:

    map = lambda func, iterable: [func(x) for x in iterable]
Which one is more readable?

First one requires looking up and down in order to understand what is going on. Second one is better, context is limited to one line. And the last one doesn't require you to remember context at all.

We'll have to agree to disagree. I think long variable names for short-lived variables decreases readability. Oftentimes these "points" are just used to glue functional pipelines together and have little-to-no intrinsic meaning. The true documentation comes from the types and is thus more trustworthy.
Oftentimes there is ML code in which both, types are implied and variable names are typical to functional programming (a,b,c,d,e) style.

In C or C++ this newer was the case, because type information was never implied (until recently, when auto was introduced). And in dynamic languages, like Python, this is also almost never the case, because good mainstream developers use object names consistently.

In the functional world however, mainstream (if there is any mainstream, as I often see each developer working in his/her own unique style) folks just say phrases like 'true documentation comes from the types' and write their recursions freely, and with no regard to the reader.

So yes. We'll have to agree to disagree.

Doesn't that have the problem that it won't get around to computing the ratio until it needs to be printed to the screen?
Depends on the semantics of the monad. If you want to control that kind of thing, you can use Strategies from Control.Concurrent. If you just want to force things, then abstract-par [1] and monad-par [2] have some pretty convenient semantics.

[1] http://hackage.haskell.org/package/abstract-par/ [2] http://hackage.haskell.org/package/monad-par/

> I just feel like they might be a bit heavy-weight and a lot of API surface area to solve this particular problem.

The surface area is `.then()`

Lack of standardization makes people go crazy. If another programming language had an API this simple, people would never think it is heavyweight. But because you're always rolling your own, people get obsessed with the smallest things in js-land.
These kinds problems can easily be solved with promises too. It would be even simpler if `fs.stat` returned a promise and there are promise libraries that do that. Promises is a small library I use, probably about the same number of bytes as your library, as I transition my code from callbacks to promises.

      var queue = new Promises;
      fs.stat("file1.txt", queue.cb());
      fs.stat("file2.txt", queue.cb());
      fs.stat("file2.txt", queue.cb());
      queue.all()
        .then()
        .fail();
But, comparing how promises solves the same flow as callbacks misses the point. Here's an example where an action is taken when two events fire (promises shine here):

      pub.on('foo', function() {
        promise1.fulfill();
      });
      pub.on('bar', function() {
        promise2.fullfill();
      });
      Vow.all([promise1, promise2]).then(...).fail(...);
It looks like you're able to return one of those queues from a function and allow some other code to call .await(). Being able to return something is a useful feature of promises too, seems like there might be more overlap there.