Hacker News new | ask | show | jobs
by ispolin 4934 days ago
Using my complete lack of statistical knowledge, I multiplied the wrong output rate % by the total lines of code in the examples from original paper here http://www.spinellis.gr/pubs/conf/2012-PLATEAU-Fuzzer/pub/ht... to get a very bad approximation of fat fingering adjusted for program length. You'd expect more typos in a longer program; the original experiment always introduced 1 typo per run regardless of program length.

You guys enjoy while I prepare for the lynch mob of Statisticians :-)

    Lang    Err %   LOC    LOC adjusted Err %
    Ruby    0.17    159   27.03
    Python  0.15    161   24.15
    Perl    0.22    156   34.32
    PHP     0.36    224   80.64
    JS      0.18    102   18.36
    Java    0.1     331   33.1
    Haskell 0.15    114   17.1
    C#      0.095   389   36.955
    C++     0.08    461   36.88
    C       0.1     458   45.8
1 comments

Looks like an improvement to my (completely unbiased, of course) eyes. Haskell moves away from C++/Java, C moves awy from them in the reverse direction, and PHP moves into its own league.

The surprises, IMO, are JavaScript (I would place it close to PHP) and perl (apparently, it is easy to come up with character sequences that are not valid perl :-))

Thinking of ways to get a perfect language according to this metric: the way to get there is to introduce lots of redundancies in the grammar. For example, if one requires two exact copies of the same source before code compiles, any single change will give compilation errors. However, programmers would build tools to defeat such strategies.

Maybe, one should scale for actual content, e.g. by weighing against the size of gzipped source code?

Yeah, look at the JavaScript LOC! Who wrote the rosetta code for those, Brendan Eich?!

This hints at another way to optimize for this metric; make the language as expressive as possible. Less characters should translate into less typos. Paul Graham strikes again! (http://www.paulgraham.com/power.html)

As to your point about redundancy, I think the researchers are in agreement with you on that one if you consider unit tests to be a sort of redundancy, expressing the same concept in two different ways. They bring this up repeatedly in their report.

Obligatory Perl jab: It surprised me that any of the Perl solutions used more than one line. :-P

  > Thinking of ways to get a perfect language according to 
  > this metric: the way to get there is to introduce lots of 
  > redundancies in the grammar. For example, if one requires 
  > two exact copies of the same source before code compiles
C Header-files T__T