Hacker News new | ask | show | jobs
by derdi 689 days ago
> Well, no, this is straight confusion between what’s expressed by the program and what’s compiled. The idiomatic code in Ocaml will end up generating machine code which is as performant than using mutable array.

This cannot be true in general. There are machine code patterns for which arrays are faster than linked lists. The OCaml compiler, great as it is, won't turn linked list source code into array machine code. Therefore, there is idiomatic code in OCaml that will not be as performant as arrays.

> It gets fairly obvious when you realise that most Ocaml developers switch to using array when they want to benefit from unboxed floats.

This is one example why your statement above is not true.

3 comments

> This is one example why your statement above is not true.

You are misreading my comment. I’m not intentionally contradicting myself in two paragraphs next to each other (I’m not always the brightest but still).

The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.

The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.

> The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.

It will do a good job, yes. Will it do the best possible job compared to some other algorithm or data structure? It can't. Not in general.

And maybe not in the specific case either:

> The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.

So, this: https://ocaml.org/play#code=bW9kdWxlIE15X2R5bmFycmF5ID0gc3Ry...

    $ for len in 5_000_000 10_000_000 25_000_000; do echo "-- ${len} elements --"; ./a.out list $len; ./a.out dynarray $len; ./a.out my_dynarray $len; echo; done
    -- 5_000_000 elements --
    list:        0.191551 sec
    list:        0.196947 sec
    list:        0.192806 sec
    dynarray:    0.301362 sec
    dynarray:    0.268592 sec
    dynarray:    0.266118 sec
    my dynarray: 0.163004 sec
    my dynarray: 0.142986 sec
    my dynarray: 0.143634 sec
    
    -- 10_000_000 elements --
    list:        0.377447 sec
    list:        0.367951 sec
    list:        0.312575 sec
    dynarray:    0.607158 sec
    dynarray:    0.582378 sec
    dynarray:    0.538621 sec
    my dynarray: 0.319705 sec
    my dynarray: 0.296607 sec
    my dynarray: 0.286634 sec
    
    -- 25_000_000 elements --
    list:        0.971244 sec
    list:        0.953493 sec
    list:        0.922049 sec
    dynarray:    1.515892 sec
    dynarray:    1.319543 sec
    dynarray:    1.328461 sec
    my dynarray: 1.119322 sec
    my dynarray: 0.971288 sec
    my dynarray: 0.973556 sec
    
    -- 50_000_000 elements --
    list:        1.852812 sec
    list:        1.848514 sec
    list:        1.505391 sec
    dynarray:    3.065143 sec
    dynarray:    2.941400 sec
    dynarray:    2.672760 sec
    my dynarray: 2.115499 sec
    my dynarray: 1.963535 sec
    my dynarray: 1.995470 sec
    
    -- 75_000_000 elements --
    list:        2.942536 sec
    list:        2.910063 sec
    list:        2.354291 sec
    dynarray:    4.567284 sec
    dynarray:    4.342670 sec
    dynarray:    3.979809 sec
    my dynarray: 2.528073 sec
    my dynarray: 2.225738 sec
    my dynarray: 2.226844 sec
A simple dynamic array implementation (my_dynarray) beats a list over a wide range of lengths. But not at all lengths! OCaml's built-in Dynarray is not competitive, but that's because it wants to make certain strong guarantees.

To be clear, I agree with your general point that we can do just fine writing nice clean pure functional OCaml code for most of our code and can hand-optimize where needed. But your very specific claims rub me the wrong way.

Oops, this had a performance bug. Instead of:

    if d.length = Array.length d.values then begin
      d.values <- Array.(append d.values (make (length d.values) x))
    end;
the array reallocation should actually be:

    if d.length = Array.length d.values then begin
      let new_array = Array.make (Array.length d.values * 2) x in
      Array.blit d.values 0 new_array 0 (Array.length d.values);
      d.values <- new_array
    end;
otherwise we allocate about a third more memory than needed. It's telling that even with this performance bug the dynamic array was broadly better than lists.

New results for the previously slowest cases:

    -- 25_000_000 elements --
    list:        0.977002 sec
    list:        0.963903 sec
    list:        0.950473 sec
    dynarray:    1.476165 sec
    dynarray:    1.281724 sec
    dynarray:    1.343222 sec
    my dynarray: 0.872558 sec
    my dynarray: 0.755902 sec
    my dynarray: 0.753746 sec
    
    -- 50_000_000 elements --
    list:        1.914777 sec
    list:        1.886989 sec
    list:        1.542614 sec
    dynarray:    2.922376 sec
    dynarray:    2.783559 sec
    dynarray:    2.537473 sec
    my dynarray: 1.725873 sec
    my dynarray: 1.545252 sec
    my dynarray: 1.515591 sec
    
    -- 75_000_000 elements --
    list:        2.827154 sec
    list:        2.835789 sec
    list:        2.318733 sec
    dynarray:    4.354404 sec
    dynarray:    4.150271 sec
    dynarray:    3.781488 sec
    my dynarray: 1.887360 sec
    my dynarray: 1.929286 sec
    my dynarray: 1.814873 sec
This turns an uneasy head-to-head into a clear win for dynamic arrays. Honestly, how could it be otherwise?
> Honestly, how could it be otherwise?

It can't be otherwise if you're just assuming a straightforward compilation model where your written roughly reflects the assembly code that's generated. This just isn't the case with better compilers though.

For instance, Haskell can often perform rewrites or fusion passes that entirely eliminates the loop and all intermediate data structures, effectively giving a near-infinite speedup compared to alternatives. You typically can't perform such optimizations when side-effects are in the middle of the computation, for numerous reasons.

You could have backed this statement up with a benchmark, and you chose not to.

(You are right on the general hand-waving level that such optimizations are sometimes possible. But there are no fuseable intermediate data structures in this particular problem.)

But they are still very much in the same order of magnitude... pretty impressive that these solutions are all in the same ballpark, I would've expected much bigger differences.
The dynamic array spends most of its time copying old elements as it grows exponentially. If you pre-size it to the right size, you eliminate this copying, and the difference becomes 5x. In practice you often have an idea of the size, at least as a rough estimation, so you would win by a larger margin. But that was not part of the spec.

Also, other ways of organizing dynamic not-quite-an-array-but-not-quite-a-linked-list data structures exist. It could be a dynamic array (or linked list) of dynamic arrays to eliminate repeated copying of the oldest elements.

No, not this. You are doing a map. A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop). See my other comment.

Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call. You have to use rev_map to get good perf.

> A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop).

The exact wording in the article is "let's say you're iterating over some structure and collecting your results in a sequence". You are interpreting a lot into this. Also, your description is of a map (reversed, and expressed as a fold). Anyway, where is your benchmark?

> Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call.

You are again contradicting yourself. Previously you were praising OCaml's optimization capabilities and now you are questioning them. Specifically in a case where there is a magic optimization that is explicitly motivated by List.map: https://ocaml.org/manual/5.2/tail_mod_cons.html . A magic optimization implemented using, guess what, mutation.

> The OCaml compiler, great as it is, won't turn linked list source code into array machine code.

Why not? If the compiler can see that you have a short-lived local linked list and are using it in a way for which an array would be faster, why would it not do the same thing that an array would do?

> Why not?

Because it doesn't. Doesn't mean it couldn't, if it tried hard enough. But it doesn't, as a statement of current fact.

Well, it doesn't make the substance wrong. This paragraph rightly summarizes it:

"The fact that most programming languages don’t give enough semantic information for their compiler to do a good job doesn’t mean it necessary has to be so. Functional programmers just trust that their compiler will properly optimize their code."

The substance of the statements I quoted was wrong. As I wrote elsewhere, I do agree with the broad statement that the combination of functional and imperative features in OCaml works just fine. But if the semantic information you give to the OCaml compiler is "linked list", it will use a linked list rather than a data structure that might be better for the task at hand.