Hacker News new | ask | show | jobs
by jpittis 2560 days ago
> I think the smaller differences are also large enough to rule out extraordinary claims, like the ones I’ve read that say writing a compiler in Haskell takes less than half the code of C++ by virtue of the language

Specifically the "by virtue of the language" part:

Seems to me like it's unreasonable to claim the languages are on equal footing because fancy parser libraries aren't allowed to be used for the project. The fancy parser libraries exist for certain languages specifically because the languages enable them to be written. (For example in Haskell: monadic libaries, libraries that take advantage of GADTs, etc.)

4 comments

I don't think monadic parser libraries have a real claim to be that difference. All the languages listed have excellent parsing libraries that make things similarly easy, if not by language power than by grammar DSL with embeddable code snippets.

I think if any library could make a real difference for Haskell it's most likely to be http://hackage.haskell.org/package/lens, which a Haskeller friend of mine claims could likely make a lot of the AST traversal and rewriting much terser.

While I found your article informative and interesting I think it only works in the very specific context of this assignment. Disallowing powerful language features/libraries means it's not a level playing field and thus not a fair comparison. Some languages standard libraries are tiny some are huge. Some languages have lots of advanced features. Eg. GP mentioned GADTs with which one can write type safe/correct by construction ASTs. In other words programs passing specific tests in a specific context does not imply they are comparable in terms of general correctness/robustness/maintainability (as you noted this re caught edge cases).
Hoopl (data flow analysis) would also make a difference. I did a very similar project at my university in Haskell and Hoopl definitely saved us from writing quite a bit of code. We also used parser combinators in the frontend, which I think saved us time too.
I've found PEGs (Parsing Expression Grammars) to make things extremely easy and terse. E.g. OMeta, Parsley, etc.

My experience with using both PEGs and parser combinators is that there isn't a huge difference in the total number of lines of code. On the other hand though, the syntax of PEGs would be easier to understand for someone who is familiar with BNF style notation.

Recoding a viable subset of lens would have taken 50 locs in haskell. Likewise, rewriting parser combinators would not have taken long for experienced devs. The problem here is that requiring people to recode the libs on top of the compiler is disingenuous. And if you ban idiomatic libs, you also ban most online help, tutorials, etc.
(A suitable subset of) Parsec is about 100 lines of OCaml. Implementing a PEG syntax on top of it is about 150 lines of Haskell (or less, I'm a Haskell noob).

Building up the knowledge to get to this point however… nope, those students were better off going hand written recursive descent (or Lex/Yacc, since an equivalent was allowed).

https://github.com/LoupVaillant/Monokex/blob/master/src/pars...

http://loup-vaillant.fr/projects/metacompilers/nometa-haskel...

My understanding is that in production compilers, hand rolled parsers are the norm. Parsing libraries are cool, but just aren’t used for big projects.
Both OCaml and GHC use parser generators. It is incorrect to suggest production compilers hand roll parsers.
Two counterexamples does not disprove “a norm”. There are always exceptions!
Right, to prove or disprove the norm one would need much more information. Two counterexamples does, however, disprove that "Parsing libraries ... just aren’t used for big projects." GHC and the OCaml compiler are both big projects, and they use parsing libraries.
I think big here means impactful.
Describing two well-engineered compilers of two relatively used languages as not impactful is quite a statement. In particular, given the good performance results they achieve, for languages that are quite far away from the normal execution model of the machine they produce code for.
Another point is that Haskell and OCaml popularized features and styles that are making their way into mainstream languages (e.g. option types instead of null, immutability by default), and the compilers showed that they can be implemented efficiently.
Both GHC and ocaml have a pretty good claim at being among the most influential compilers of their generation.

If you look at Java for instance, generics were made by one of the Haskell creators, and it has been implementing functional features for years now.

Excluding the lens library (as per the article) is unusual, it provides natural getter/setter and row polymorphism type functionality.

More anecdotally, I’d argue parsing libraries are common, just look at the prevalence of attoparsec and others. But most parsing libraries in the ecosystem are parser combinator libraries which don’t support as performance and nice error messages that compilers need

That was where I stopped reading. If a library like lens—used by nearly every haskeller in every project—was disallowed, I don’t know what the purpose of this exercise was.
Lens is far from being used by everyone in the space. On a sample of 5-6 professional users I talked with at this zurihac, most didn't use it.
I mean, it seems like it was an university course, and writing a compiler from scratch is probably a decent exercise in a compiler course.
Restrict the students from using a parser library. I get that. But allowing nothing except that standard library? That’s stupid.

It also makes the language comparison useless. Python has a standard library that is continuously improved and people reach to that when writing programs. Haskell, like C, ossified it’s standard library when it was created and people use the external packages for equivalent up to date libraries.

Parsers are not the interesting parts of compilers.
I dunno, libraries like earley in Haskell, provide pretty nice error messages.
It depends entirely on whether the big project still has an elegant and complete formal grammar. Hand-rolled parsers are only common in industrial languages because many have grown to be far too complex and ad-hoc, requiring e.g. additional analysis and disambiguation during parsing. It is not a situation to aspire to.
A typical translation of C++ code into D reduces the line count by a substantial amount, simply because D doesn't require .h files.
What are relative compile times like?

Building Chromium atm, and to be honest I'd be happy if it were written in a trillion lines of BASIC if that would somehow achieve even a 10x build time speedup.

D is known for having an extremely fast compiler. In fact, Walter Bright wrote one of the fastest C++ compilers (the Digital Mars C++ compiler) before he wrote D.
For chromium, the most important concern will be runtime speed, more than compile time
Languages with native support for modules can have both.

Lets see how C++20 will improve the situation.

That, and most people who are compiling it are chromium developers, who will be using dirty builds and so have relatively quick compiles.
Think Delphi and .NET Native compile times.
I assume you are using ccache?
Using a fancy parser-library would mean we should also count the lines of code in it. It would basically mean adapting an existing solution. In practice that would make a lot of sense, but if the purpose is to compare the productivity of different languages then not so much.
Isn't it a facet of productivity of the language that it's easier to write certain types of libraries in one language than another? If you're writing a compiler, the fact that lots of people write parser libraries in Haskell is a point in favor of Haskell, whether if you intend to use those libraries (because they're available and production-tested) or you intend to write your own (because it's demonstrably a productive language for that sort of work).
Yes from the outset. But this study was about writing a parser and it seems Haskell didn't do much better than the others.