Hacker News new | ask | show | jobs
by jiggawatts 1860 days ago
To me, this is infinitely more readable:

    let mut input = ["select", "*", "from", "potato"];
    for i in 1..input.len() {
        if ["select","*"] == input[i-1..=i] {
            input[i] = "_";
        }
    }
If I could be bothered to dig up a Haskell compiler, I'm sure it's possible to do a one-liner list comprehension that is both terse and readable.

If I was doing this in Rust, it's easy to create an iterator extension that does something like "map_lookback" which explains the intention without requiring comments.

I'm going to be blunt: Unreadable array languages are popular with quants because they work in a highly competitive, cut-throat industry where "write only" languages provide job security. I've come across developers purposefully obfuscating code by using only one-character identifiers and zero comments. One of them literally blackmailed their employer, demanding their salary be doubled on the grounds that there was no chance their code could be maintained by anyone else. He should have gone to jail for that, but he had his managers by the balls, got what he wanted, and bought a house with cash soon after.

Why are we applauding this?

2 comments

> To me, this is infinitely more readable:

Yes, to you. Just like Chinese would be infinitely less readable to you, if you don't know Chinese.

Do you think this is a deep observation?

The real challenge is knowing whether it is worth it to learn k so that it becomes readable.

- How many characters is it? This is a useful metric when you realise bug/defect rate is proportional to the physical size (in rows and columns of source code) of a program given equivalent processes (Moore 1992; McConnell 1993) but that process matters more than anything else. Not tooling, not "memory safety", and certainly not "readability" by people unfamiliar with the language.

However "readable" you think your rust code is, did you notice the bug in your rust code? (hint: input is not supposed to be mutable)

- How fast is it? On my 2014 i7 macbook air, the supplied us averages 232 msec for 100k cycles. My version

    us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':
is faster: 126msec for 100k cycles.

- How quickly was it written? I can't speak for sa/atw on us since I didn't see them write it. I wrote mine in about 30 seconds including testing. Yes really. How long did your rust program take to write? Did you think simply because you thought you understood the requirements that you didn't need to test it?

- How quickly can someone familiar with the language read it? Again, I can't speak for anyone other than myself, but I'm not a k expert -- I program in it very infrequently, and the new amend-syntax in k9 I had not run across previously. And yet I read it as quickly as I said.

These four values (less code, fast run, fast write, fast read) are the biggest most important things to me. And anyone who shows me all four of things will get my attention.

Rust? Simply does not impress.

> Unreadable array languages are popular with quants because they work in a highly competitive, cut-throat industry where "write only" languages provide job security.

That's another interesting opinion. This one might even be true amongst some quants (Most of the ones I know that use k don't particularly like k). But I suggest you try not to expect the worst in people. Yes, some people are assholes, but most people aren't. And for what it's worth, I'm not a quant (I work in Advertising).

> 126msec for 100k cycles.

Or to put it another way: 1,200 nanoseconds. That's about 3,000-5,000 instructions on a modern CPU. Believe it or not, that's actually pretty bad.

After jumping through some hoops to ensure that rustc doesn't just compile the whole thing down to a constant, I benchmarked my version as taking 15-20 nanoseconds per iteration. About 45-80 instructions!

I actually couldn't quite believe it myself, so I jumped through more hoops to ensure that it wasn't being optimised away, wasn't getting inlined too aggressively, etc... No change.

Ran it through Godbolt to inspect the assembly, and then I realised that, yes, modern languages, compilers, and CPUs really are this good!

Think about it: For the specific input example with 4 strings the algorithm boils down to: compare 7 bytes with 7 bytes, replace a pointer with another pointer, then compare 2 bytes with 2 bytes three times. That's about a hundred assembly instructions, or thereabouts.

Godbolt output:

    push   rbp
    push   r15
    push   r14
    push   r13
    push   r12
    push   rbx
    push   rax
    mov    r12,rdi
    mov    ebx,0x8
    xor    ebp,ebp
    lea    r14,[rip+0x3cb3c]        # 444e8 <anon.6527da4acb4810bb73692fa85a2e25ef.0.llvm.5506334730328533235+0x20>
    mov    r15,QWORD PTR [rip+0x3f3b5]        # 46d68 <bcmp@GLIBC_2.2.5>
    cs nop WORD PTR [rax+rax*1+0x0]
    nop    DWORD PTR [rax]
    cmp    rbp,0x2
    je     79f2 <example::testabc+0x62>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1]
    cmp    rdx,QWORD PTR [r12+rbx*1]
    jne    79ec <example::testabc+0x5c>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1-0x8]
    mov    rdi,QWORD PTR [rbx+r14*1-0x8]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     79c0 <example::testabc+0x30>
    cmp    r13,0x2
    jb     7a07 <example::testabc+0x77>
    lea    rax,[rip+0x3060e]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x10],rax
    mov    QWORD PTR [r12+0x18],0x1
    xor    ebx,ebx
    xor    ebp,ebp
    nop    DWORD PTR [rax+rax*1+0x0]
    cmp    rbp,0x2
    je     7a43 <example::testabc+0xb3>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1+0x8]
    cmp    rdx,QWORD PTR [r12+rbx*1+0x18]
    jne    7a3d <example::testabc+0xad>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1+0x10]
    mov    rdi,QWORD PTR [rbx+r14*1]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     7a10 <example::testabc+0x80>
    cmp    r13,0x2
    jb     7a58 <example::testabc+0xc8>
    lea    rax,[rip+0x305bd]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x20],rax
    mov    QWORD PTR [r12+0x28],0x1
    mov    ebx,0x8
    xor    ebp,ebp
    nop
    cmp    rbp,0x2
    je     7a93 <example::testabc+0x103>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1]
    cmp    rdx,QWORD PTR [r12+rbx*1+0x20]
    jne    7a8d <example::testabc+0xfd>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1+0x18]
    mov    rdi,QWORD PTR [rbx+r14*1-0x8]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     7a60 <example::testabc+0xd0>
    cmp    r13,0x2
    jb     7aa8 <example::testabc+0x118>
    lea    rax,[rip+0x3056d]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x30],rax
    mov    QWORD PTR [r12+0x38],0x1
    add    rsp,0x8
    pop    rbx
    pop    r12
    pop    r13
    pop    r14
    pop    r15
    pop    rbp
    ret    
    nop    WORD PTR [rax+rax*1+0x0]
Rust? It simply impresses.
I'm not sure getting the wrong answer fast is something to be proud of.

Your rust code has a bug in it, is longer, and you spent more time on it.

Please elaborate on that bug. The rust function I provided replaces "select, *" correctly with "select, _". Is that not what it was supposed to do? I checked that it works with empty arrays, arrays with one entry, etc...

Thinking about it, this version boils down the logic to its barest essence:

    fn test(input: &mut [&str]) {
    
        let mut was_select = false;
        for i in input.iter_mut() {
            if was_select && *i == "*" { *i = "_"; was_select = false; }
            else { was_select = *i == "select"; }
        }
    }
It runs in 5.5 nanoseconds per iteration, which is just absurdly fast. That's about 200x faster than your K version!

What I like about this Rust version is that it reflects the approach that for a computer processor is optimal, yet it is high-level and readable. I could convert that "test" function easily enough to a generic "replace" function similar to a string search & replace, but one that can operate on any mutable iterator with the maximum possible efficiency, not just arrays.

Then, your K code that requires "careful reading" to slowly tease apart the meaning could be converted to a form that is practically prose:

    let mut input = ["select", "*", "bar", "potato"];
    replace( &mut input, &["select", "*"], &["select", "_"] );*
> The rust function I provided replaces "select, *" correctly with "select, _". Is that not what it was supposed to do?

No. It is not supposed to modify its input but return a copy.

Granted, if replacing arbitrary sequences with arbitrary sequences, then copying is necessary, as this could result in the output length increasing.

However, for replacing scalars only, the in-place mutable version is in some sense superior: it doesn't force a memory allocation. Moreover it can be trivially converted into a copying version by simply wrapping it in a function that first copies the input, and then mutates it in-place. The reverse is not true: the copying version cannot be wrapped to create a non-allocating version.

To be honest, I wish more languages put this kind of effort into their standard libraries, but most have about 10% of what I would like to see. For example, this kind of "string matching" is really "sequence matching" and ought to be fully generic for any underlying comparable and copyable type, not just arrays of characters. String search algorithms like Boyer-Moore ought to be directly applicable to arrays of integers or enums, e.g.: for recognition of code patterns in lists of parsed tokens.

At least one person has done something like this for Rust: https://github.com/peterjoel/rust-iter-replace/blob/6c575eeb...

Similarly, there's the new InPlaceIterable, which is interesting but not quite enough to suit my taste: https://doc.rust-lang.org/std/iter/trait.InPlaceIterable.htm...

Because you're wrong. Do spend six months doing hobby array programming. Your view will change.
It's on my bucket list. But it seems so niche that I suspect I'll never use it.

My only motivation is that I've been tempted to develop a toy programming language myself that is as high-level as Rust or C++ but explicitly designed for "wide" processing platforms such as GPUs or many-core processors with SIMD instruction sets.

All I'm saying is that the concept of array programming -- like pure functional programming -- may be valuable, but the syntax is not.

I just can't understand how even experienced programmers can fixate on syntax. In my - repeated tens of times by this point - experience the syntax is important for a few months (tops!) at the beginning of using a language, and then stops to matter almost completely. What experiences would make someone convinced otherwise? There's an argument against too high complexity of syntax, but most general-purpose languages out there are very similar in this regard...
There's empirical evidence, loads of it, that this simply isn't true.

Syntax absolutely does matter, in all walks of life, not just programming.

We're not infinite, error-free computers like some sort of mathematical abstraction. We have squishy meat brains evolved to tell stories around a campfire.

As a random example: I learned Rust just for fun, and something that repeatedly caused hours of frustration is that unlike every other modern language, it allows identifier shadowing. This compiles and runs just fine:

    let a = 5;
    let a = "wat?"
Practically no other language allows this, because it is a recipe for errors.

Internally, compilers typically use single static assignment, so the above would be processed something like this:

    let a_1 = 5;
    let a_2 = "wat"
For a compiler to track this is no problem. For a human? It's a problem. Maybe not for trivial examples like this, but if dozens of identifiers are littered across a huge function it can be a challenge to keep track of which one is which if the names are the same but they're... not the same. Especially if the change is subtle.

We're not computers, that's why we make them out of silicon and metal and sell them for money.

> There's empirical evidence, loads of it, that this simply isn't true.

Source?

> unlike every other modern language, it allows identifier shadowing

1. All dynamically typed languages allow this, all/most REPLs even for statically typed languages allow this, even if regular code doesn't.

2. This is not syntax, at all, this is semantics of immutable value declaration+initialization.

> but if dozens of identifiers are littered across a huge function it can be a challenge to keep track of which one is which

Littering dozens of identifiers across huge functions is going to be a problem, no matter the syntax. It's a programmer's job to manage complexity by utilizing various kinds of techniques, including syntactic sugar (true), but also factoring the code into manageable chunks, using higher-level or better suited for the task at hand abstractions, and so on. Syntax on its own is, in my experience, the least impactful technique for managing complexity, at least before going into DSL-land (but then it's syntax+semantics).

Another thing worth mentioning is that lots of general-purpose languages give you exactly the same constructs, just spelled differently. Simple examples:

    for (x : list) { ... }
    foreach (x; list) { ... }
    for x <- list do ... end
    for x in list: ...
    for my $x (@list) { ... }
    for x := list { ... }
    list each(x, ...)
    list each: [ :x | ...]
    (loop for x in list do ...)
    (for [x list] ...)
All the above denote exactly the same construct, with very little variation in the number of tokens. Can you tell me if you think one of them is better than the others - and if so, why?
Syntax can certainly be more or less suitable to a particular problem or even to humans in general, but (dis)allowing identifier shadowing is part of the semantics of the language (i.e. what the syntax means). One could say: "there should be syntactic cues if we are to allow variable shadowing". That would be a well-defined criticism of syntactic issues. BTW, most modern languages do allow identifier shadowing. The particularity of Rust and the object of your criticism is allowing it in the same scope as the definition.

But again, the only way you'll be able to apprehend array language notation is by writing enough of it to become somewhat fluent. It doesn't mean you have to like it, but remember it was the object of a Turing award and that's usually a good indicator of something relevant.

> the concept of array programming -- like pure functional programming -- may be valuable, but the syntax is not.

You would not be the first person tempted to try and separate the two, but I think it is impossible. Every attempt to do so I have ever seen has lost too much in the translation that the four-values I have for programming are no longer conserved.

https://futhark-lang.org/

Is the closest you'll get to that, I think. It took a whole team of big brains to make it, and the author stated it was highly non-trivial to make something like that.

Interesting, but not quite what I had in mind.

My idea was loosely based on a code search engine Google had for a while before they killed it off.

I'm not sure exactly how they did it, but I very strongly suspected that they implemented regular expressions over database indexes.

A simple b-tree index is just a set of sorted strings. If you have a lot of sorted strings and you squint at it, you can see how it picks out common prefixes, like a tree. (You can literally store it with the prefixes factored out, and then you have a trie.)

For a fairly wide range of regular expressions, and a tree of sorted strings, you can implement a search over the b-tree. E.g.: the search "[bx](a+)foo" can be implemented via finding the range starting with "b", then the range starting with "x". For each range, find the sub-range starting with "ba", "xa", etc...

Each step takes logarithmic time, and can be done in parallel. In theory, it takes only about a hundred times longer to find a ten matches in a million strings than it would take to match a single string.

Wouldn't it be nice if you could take an algorithm such as "match regex" that normally takes a 'scalar' string and have the compiler automatically create a version that can take an array of sorted values, like the database index?

The idea is that much like how array programming languages eschew scalars and pass arrays all over the place to gain efficiency, my language would pass sets all over the place, and gain a different type of efficiency.