Hacker News new | ask | show | jobs
Language War - Scala versus Python (blog.zlemma.com)
29 points by zubinmehta 4863 days ago
11 comments

I think you shouldn't have let people talk you out of using Haskell. It's a very nice language and has some advantages for this sort of project (at least based on your cursory description).

However, its advantages are really not the important part. Rather, I just wish you wouldn't dismiss it immediately as a crazy choice. It's no more crazy than any other less-popular language. It has a reputation for being impractical, but this reputation is rather unfair especially in the light of recent very practical developments like simpler concurrency, an improved IO manager, very good web frameworks and a fair about of strong libraries both for very specific domains and general productivity.

The main disadvantage is that many people find it hard to learn. However, this is a function (heh) of functional programming rather than the language itself. It's actually a simpler language than Scala in many ways because it tries to do one thing well--for example, it has no sub-typing, so you do not have to ever worry about covariance and contravariance.

Now, my point here is not that you should always use Haskell, just that you should seriously consider it. Too many people dismiss it out of hand almost as a joke when it is anything but.

Thanks for your support. Like I mentioned, I will circle back to Haskell as the business develops. If I was the only programmer in my startup, I would likely go with Haskell for important components of our software. But I have concerns about hiring a large team that would thrive in Haskell. One never knows - I'm keeping an open mind :)
Hmm, I'm not convinced that hiring a team of good developers for Scala is all that much harder than hiring good Haskell programmers. Haskell has a disproportionate number of very capable people interested in it, and from what I've heard acts as both a way to attract top developers and as something of a filter (at least in encouraging applicants to be more self-selected).

I think these effects cancel out Haskell's relative lack of popularity unless you expect to be hiring at the enterprise Java level--something like literally thousands of developers.

Anyhow, keeping an open mind is the most important thing, and you clearly have no problems there. I just wish others would follow suit in that regard.

"The core of ZLemma.com is a mathematical framework involving highly structured data representations and numerical algorithms."

You might also consider a Haskell core, surrounded by a Python website. (Or any other website language/framework you like.) It strikes me as likely you have some sort of relatively small interface you can define between those two things, and then you can use the strengths of everything.

I am with you on this. More on this in Part 2 of the post.
Haskell's syntax is weird compared to more popular languages. This is the main reason it's hard.

That said, It's a better choice than Scala for math. However, if the math is numerical algorithms, Python is better than either Haskell or Scala.

I'm not sure why they need to settle on one language. Seems like painting the bike shed. For better or worse, every sufficiently large web system has a number of languages being used behind the scenes.

Syntax is by far the most superficial difference. If you're going to actually use functional programming, the trick will be learning the paradigm and not worrying about syntax. In fact, from that perspective, I think Haskell syntax actually wins out: it's extremely simple and very well-suited to functional programming. I found things like curried functions and recursion much easier to grasp and use in Haskell than in Scheme when I was learning both simultaneously.

I certainly do not think that syntax is anywhere near why some people find Haskell hard to pick up. Thanks to pattern matching, the syntax is very visual and thanks to having relatively few forms and keywords, it's much simpler than most imperative languages'.

Also, I'm not suggesting necessarily using Haskell exclusively, by any means; I just want more people to consider it at all!

Well, I'm still not so sure. I've been doing functional programming for 20 years and I thought Haskell syntax was weird.
I find Haskell's syntax very attractive. It is the same style in which one would express domains and operations on a whiteboard.
Novus Partners also does Scala. We've even got a few Python guys coding in Scala. It's just a language with the JVM ecosystem to back it up. Python has it's own ecosystem and for numerics, access to LAPACK and UBLAS is pretty awesome, I must admit. That said, both can exist fairly comfortably in a company and can be used for different purposes.

I like Python for prototyping, algorithm validation, and just to hack on. I like Scala for damned near everything else. (Also, Adam is going to be open sourcing what is essentially a Scala clone of Pandas but statically typed and with comparable performance.)

The article's basis of the comparison for building an entire business seems to be very shallow.

Surely other considerations are more important, such as,

- encapsulation/domain partitioning

- messaging

- libraries and library maturity

- native code interoperability (likely critical for this application)

- concurrency

- performance

- JVM platform

- etc.

Maybe, as a result of this type of analysis, Python-like language is more suitable in some domains and a JVM language in others.

In my opinion, for the requirements quoted, with a deep and performant mathematical framework involved, I cannot envisage how Scala/JVM could win any "war" for the core of the business.

This kind of comparison reminds me about something similar about Git vs Mercurial: http://importantshock.wordpress.com/2008/08/07/git-vs-mercur...

It looks more like an emotional kind of story than one about practical decisions.

Indeed! At the early stages of a startup, most decisions (including technical ones) have a strong emotional/instinctive flavor. When I worked at large corporations, the decisions were almost always devoid of emotions.
A few more considerations will be posted in Part 2 of the post.

I appreciate your pointers - as the business grows and takes shape, a fresh evaluation will be required where some of the considerations you mention will be taken into account.

"And who the hell does Scala? (Actually, Twitter does!)"

Foursquare also uses Scala, as does LinkedIn. Here's a page of organizations using it:

http://www.scala-lang.org/node/1658

This was done a few years ago. I'd love to hear what people have to say about Scala now that code bases have grown. Anyone working with a couple hundred thousand lines of Scala?

For instance, I know compilation performance was always frustrating for developers. What's it like for teams of people dealing with a large code base?

We are working with Scala at that scale at Foursquare, and compilation times are definitely a headache. What helps is ensuring your dependencies are acyclic so you're able to do smaller incremental compiles when you make changes. To get us on the path of a DAG-ified codebase, we've been using a build tool developed by Twitter called "pants":

https://github.com/twitter/commons

which has some similarities to Google's Blaze build tool.

Good to see Scala's popularity. I mentioned only Twitter because I admire them :)

-Ashwin

No discussion of run-time characteristics? I realize for a some start-ups this is not the most important metric, but these guys sound like they might be compute bound. And faster language can mean cheaper/less hardware.

Scala vs Python: http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

Also since he mentioned Haskell at first, Haskell vs Scala: is also interesting: http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

Haskell vs Python is just for lolz: http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

Seriously, computer language shootout is just harmful. Those benchmarks are bad (but it's unclear whether cross-language comparisons can get much better), implementations are worse. And on top of that it does not include opitmized VMs like PyPy or LuaJIT (in fact, it does not include PyPy because we complained at some point).
All benchmarks come with the implicit disclaimer:

The best benchmark is always your application. All benchmarks are flawed, use your judgement and determine how flawed a benchmark is; Any flaws are relative to your application similarity to what the benchmark tests. An imperfect tool is not a useless tool, so long as you are smart about how you use it.

This is probably relevant too: http://benchmarksgame.alioth.debian.org/dont-jump-to-conclus...

Those comparison pages come with an explicit wake-up call:

"These are not the only compilers and interpreters. These are not the only programs that could be written. These are not the only tasks that could be solved. These are just 10 tiny examples."

>> because we complained at some point <<

That's not true, and you already know that's not true.

http://news.ycombinator.com/item?id=4599431

Please stop making this pathetic accusation.

Oh, yes sorry, it was completely coincidental. It was correlated in time though.
>> It was correlated in time though. <<

Post hoc ergo propter hoc is a well known fallacy.

Lest anyone forget that your repeated complaints don't withstand scrutiny -- http://news.ycombinator.com/item?id=4598737

Thanks! Great articles. We are not compute-bound for the near-term, so it's not our most important consideration. As the business evolves, we will embark on a fresh language war where compute performance is likely to be a key factor.
Note that in each case, Python has the smallest code. Generally, but not always, this means that people can address tasks quicker in Python.
For doing anything serious in Python, you will need unit tests. In my experience, they double the amount of code.

Haskell and Scala have static typing instead (and therefore need much less unit tests).

Python has real good, battle tested and actively developed libraries for math, scientific computing and analytics. Most of these are written in C and hence give great performance in addition to the flexibility of python. Some great ones are: Numpy/Scipy, LAPACK, Pandas (lots more for machine learning as well). Also, I think the REPL is no match for IPython :). For your specific domain (math and analytics) python could be a great fit. Of course the lack of type systems and no compile step can be scary, but for a startup, which needs to move fast, python gives you the power of quick iteration. Just to round this off, from the zen of python:

"practicality beats purity."

+django
I haven't tried Scala but since no one has mentioned it: Clojure is a pleasure to work with if you want a functional language in JVM.

I have no idea which and how many startups use Clojure, besides Datomic :) It would be interesting to know.

I'd like to agree here. IMO up-and-coming Clojure is an up-and-coming competitor to Scala. As a Lisp variant for the JVM, I'm a major fan of it.

Unfortunately I can't share any personal experience as the core of my work is in the Enterprise so I haven't had the right opportunity to present itself. But if you're coming from a mathematical background you should check it out

> Typecheck in Python

People actually use it? It seems abandoned and the link to homepage broken: https://pypi.python.org/pypi/typecheck. I'm asking because I'd really like the idea of starting a project the dynamic typing way and then bolting on a (semi)-static type system on top once more new programmers join the game, preferably being able to completely disable it on production machines for best performance.

Have you checked out Cython[1]? It's used quite often in Numpy. Basically, you add some type annotations like, int, double, etc. for some speed-up. There is a quick tutorial as well [2].

[1]: http://cython.org/

[2]: http://wiki.cython.org/tutorials/numpy

I know, it great for the speed-up. But for just making things more manageable it's not worth the added complication of build/compile. I know, my problem is actually solved by "properly written tests" :) ...for I still long for some optional drop-in static typing.
> Besides, dynamic typing scares the hell out of me

Funny. I feel exactly the opposite. Static typing is false security.

I program in JS. My coworkers in Java. They have so many bugs that make it to prod because they assume that just because it compiles it must therefore be safe.

Well, it's not like type checking is a binary attribute of a language - for example, Haskell/ML have 'stronger' type systems than Java/C++.

In the former languages, you can encode many more attributes of a program into the type system, and so you can have check at compile-time many more invariants of your program - e.g. enforcing a function does no I/O, enforcing that all possibilities are handled when pattern matching, enforcing the equivalent of nullptr checking, etc.

In languages like C++, there are techniques you can apply to allow the type system to help you out a bit (see this talk by Jordan DeLong for some examples [1]), but in general, the guarantees you get from a non-HM type system are weaker than those from one, and so the notion of successfully passing type-checking is correspondingly weaker.

[1]: http://vimeo.com/55674014

Yeah, Java is not a fair example. It has a type system with the worst compromise between being awkward (and infamously verbose) and not very effective: it gets the short end of the stick on both accounts. It's better than C, granted, but that's saying nothing. It does not compare to a good type system like Scala's or especially Haskell's.

Good type systems can catch far more bugs than you imagine. Moreover, they can actually make writing code easier: there are some extremely valuable and expressive features like typeclasses that simply cannot be reproduced in a dynamically typed language.

You are totally missing the point of the parent. Just because types work out doesn't mean the behavior of the program is correct.

The problem is not how to make something compile, regardless of how complex the types are. The problem is how to do the right thing. It is not enough to return a string - you must return the correct string, and that is not normally not something a type can solve for you. I suppose you may solve that problem too in haskell, but then you will get bugs in the type definition instead. TANSTAAFL.

"It is not enough to return a string - you must return the correct string, and that is not normally not something a type can solve for you."

It can, however, ensure that you have returned a legally-formatted email address instead of someone's first name, with proper use of the type system. You'd probably be surprised what you can do with a real type system, based on what you're saying here. It is not a proof of correctness, but there's a great deal more possible than in Java.

It of course can not ensure you returned the correct email address, but often that's a much less pressing problem, because the thing returning the address may very well have had only one address in hand to chose from, in which case it can't be wrong. There's a lot of logic like that you can deploy in Haskell, when scopes and inputs to functions are much more carefully controlled than a traditional language.

Nowhere in his post did tikhonj suggest that compilation == correctness, even in Haskell. Of course it isn't. The only people making that assumption are wyqueshocec's coworkers, and naturally they're getting burned as a result.

The key point is that a sufficiently expressive type system gets you far closer to correctness on the basis of type checking alone than does a weaker static system like Java's or a dynamic one like JS's. You should still write tests, you just won't have to write as many in order to reach the same degree of confidence in your code.

Not all static type systems are created equal. The sort of type checking provided by Java (and similar languages) provides far less safety than that of a full-fledged type-inferring language like Scala or Haskell.

I use Ruby and JavaScript professionally. I do like Ruby for a number of reasons, and JavaScript I can at least live with. But in both languages, I constantly find myself spending tons of time tracking down bugs that would have been discovered right away if I had Haskell's type system at my disposal.

I prefer static typing because those assurances reduce the number of things I have to manage--they mean I must keep less state in my head. I do not assume that it guarantees things that it does not.

Attributing your coworkers' "if it builds, it works" assumptions to static typing strikes me as incorrect. Static typing, at least in Java, ensures only a (relatively) small set of conditions are true; I don't think you can really blame that for increased/additional assumptions on the part of sub-par programmers.

Either way, you end up keeping lots of things in your head. With dynamic langs, you do have to think about types and use them conservatively. With languages like Java -- the same. No toolset is going to work well unless the dev is on top of things and good at using the tool well.

I've found that debugging in a dynamic environment is great for managing types in a well architected system. It falls down in a poorly architected one, in ways that static typing would not ave allowed. However, in such systems, "it compiles so it must be correct," programmers just get stuck in deeper snow.

This is true, but I personally find that the tools available in decent statically/strongly-typed languages--C#, Scala--can do a fair amount in encouraging the Right Thing in ways that dynamically typed languages (especially dynamically weakly typed languages) can't.

I'm even becoming more and more fond of C++ because of the constraints that you can work in with proper use of templates; my only beef there is that the error messages when you do the Wrong Thing are often not conducive to understanding what the Right Thing is.

I must admit that JS scares me more than anything else. Would love to hear about defenses you employ in JS.
Are there no tests beyond simple compilation?
Did you guys consider Google Go? I really like the language and its environment after coding in it for a bit.
I think the reason was the math libraries. There aren't mature libraries yet for numerical computing. Python has Numpy/Scipy, but am not sure of Scala (I don't use).
Haskell all the way!
:)