Hacker News new | ask | show | jobs
by gregwebs 4672 days ago
This is really interesting. A hypothesis I have on Ruby is that people attribute dynamic typing to it being a productive language, but that Ruby is actually productive for other reasons, in spite of being dynamically typed.

With Crystal, at least when it matures a bit more, this hypothesis could be tested.

There are very logical reasons why dynamic typing at first appears better than static for Rubyists, that I think don't hold up as well after you scratch the surface:

* many Rubyists came from Java, and that kind of typing does slow you down. You need a modern type system with at least local type inference (Crystal seems to have global type inference)

* dynamic typing does actually help develop things more quickly in some cases, definitely in the case of small code bases for newer developers. A developer only has to think about runtime. With static typing a developer also must think about compile-time types, which takes time to master and integrate into their development. The relative payoff of preventing bugs grows exponentially as the complexity of the code base increases and at least linearly with size of the code base.

6 comments

There is a Ruby with static typing---it's called C#. Seriously, even though my background is all in non-MS stacks, I did some C# work a few years ago and found the language surprisingly nice, and with a lot of the features I love from Ruby. It even has type inference like Crystal. (The libraries are not nearly as clean/consistent as Java though.) For folks in the Ruby/Python/Java world, it's something to look at.
As someone whose last 8 years of professional work have been largely split between...C# and Ruby, I have to disagree. I do agree about C# being quite nice; it's all the things Java should be but isn't. And it's evolving quickly in all the right ways.

But it isn't really like Ruby with static typing. The language isn't Rubyish (no mixins or blocks, for example). It makes you put everything in a class, a la Java. It can verbose and is occasionally downright clunky (though syntactically it's categorically slicker than Java). The .NET ecosystem doesn't have the Ruby characteristic of lots of small, fast-evolving libraries that are easy to use. In fact, the C# open source ecosystem is kinda poor in general and not a huge part of most developer's lives, whereas Ruby's ecosystem is vibrant and an integral part of its coding culture.

Another way to put all that is that if C# were purely dynamically typed, it wouldn't feel anything like Ruby.

I do see what you're saying: LINQ feels like a static (and lazy!) version of Ruby's Enumerable module, the lambdas look similar, C# actually does have optional dynamic typing, and C# is increasingly full of nice developer-friendly features. In general, I'm a fan. But switching between them doesn't feel like just a static/dynamic change.

Interesting - I found the lambda syntax of LINQ to be quite "Ruby like"

  someList.FindAll(i => i < 2)
    .Select(i => i * 2)
    .GroupBy(i => (i % 2) == 0 ? "even" : "odd")
Is about as close to Ruby's Enumerable as I've found in a mainstream/enterprise language (unless you include Scala).
Yeah, I mostly of agree with that and sort of said so in my post. Not sure it has much impact on my central point though.

One really key difference with LINQ is that it doesn't produce arrays (or dictionaries, as in your example); it produces Enumerators, which you then have to do call toList() or toDictionary() on. That laziness is actually an awesome feature and my favorite thing about LINQ, because it can massively improve performance by shortcutting work and not creating intermediate arrays. You can even work on infinite sequences with it. Besides performance, it's just tastier. It's so great I actually wrote a Ruby library to imitate it: https://github.com/icambron/lazer

Is LINQ really fast/performant though? Wouldn't the above expression cause three sequential loops to run?

One of the biggest performance issues I've seen with modern .NET code is people abusing LINQ and lambdas. Chaining functions like this is most decidedly not fast. I once wrote a library that had do do some heavy signal processing on large data sets, and since I wanted to ship the first version as soon as possible, I just used LINQ in a lot of functions to save time. It wasn't very performant so later I rewrote most of the functions to use standard native code such as loops for iteration, hashmaps for caching and all sorts of improvements like that. I completely got rid of LINQ in that version and for many functions the runtime went down from something like 500ms-1000ms to microsecond area.

So sure, LINQ makes development fast and it's very nice to be able to write code such as .Skip(10).Take(50).Where(x => ...). On most web projects, it won't make a huge difference. I've seen Rails "developers" use ActiveRecord in such a way that they would create double and triple nested loops and then hit the database multiple times by using enumerable functions on ActiveRecord objects without realizing how this works, what's going on behind the curtains and so on. I've seen .NET devs do similar things using EntityFramework.

So yeah, it's convenient and all, but it can also be very dangerous when used by someone who doesn't understand the fundamentals behind these principles.

> Wouldn't the above expression cause three sequential loops to run?

No, it wouldn't; that's the really important point about LINQ I was, clumsily, trying to express above [1]. Take this admittedly totally contrived example:

  someList
    .Where(i => i % 2 == 0)
    .Select(i => i + 7)
    .Take(5)
This is not equivalent to a bunch of sequential loops. What it is is a bunch nested Enumerators. Here's how it works. It gets the list's Enumerator, which is an interface that has a MoveNext() method and a Current property. In this case, MoveNext() just retrieves the next element of the list. Then Where() call wraps that enumerator with another enumerator [2], but this time its implementation of MoveNext() calls the wrapped MovedNext() until it finds a number divisible by 2, and then sets its Current property to that. That enumerator is wrapped with one whose MoveNext() calls underlying.MoveNext() and sets Current to underlying.Current + 7. Take just sets Current to null after 5 underlying MoveNext() calls.

So all that returns an enumerable, so as written above, it actually hasn't done any real work yet. It's just wrapped some stuff in some other stuff.

Once we walk the enumerable--either by putting a foreach around it or by calling ToList() on it--we start processing list elements. But they come through one at a time as these MoveNext() calls bring through the list items; think of them as working from the inside out, with each MoveNext() call asking for one item, however that layer of the onion has defined "one item". The item is pulled up through the chain, only "leaving" the original list when it's needed. The entire list is traversed at most once, and in our example, possibly far less: the Take(5) stops calling MoveNext() after it's received 5 values, so we stop processing the list after that happens. If someList were the list of natural numbers, we'd only read the first 10 values from the list.

Now, those nested Enumerator calls aren't completely free, but they're not bad either, and you certainly shouldn't be seeing a one second vs microseconds difference. If you craft the chain correctly, it's functionally equivalent to having all of the right short circuitry in the manual for-loop version, and obviously it's way nicer.

So why are you seeing such poor perf on your LINQ chains? Hard to say without looking at them, but a few of pointers are: (1) Never call ToList() or ToDictionary() until the end of your chain. Or anything else that would prematurely "eat" the enumerable. (2) Order the chain so that filters that eliminate the most items go at the end of the chain, similar to how you'd put their equivalent if (...) continue; checks at the beginning of your loop body. (3) Just be cognizant of how LINQ chains actually work.

[1] In the example in the parent, FindAll isn't actually a LINQ method, so there is one extra loop in there. Always use Where() if you're chaining; use FindAll() when you want a simple List -> List transformation.

[2] A detail elided here: each level actually returns an Enumerable and the layer wrapping it does a GetEnumerator() call on that.

I have a very similar background/opinion. Almost all of my professional work has been on the *nix stack and I generally like Ruby. During a brief stint in Microsoft-land I found myself super impressed with C#. First class functions, type inference, anonymous types/functions and LINQ make for a really nice general purpose language. Its really a shame its mired in the Microsoft toolchain.
Having tried both C# and Go (both having functions as first class citizens), I'll chose Go.

Mono always seems to be behind the M$ release of .NET and I'd rather not bother.

> (The libraries are not nearly as clean/consistent as Java though.)

I've had the complete opposite experience. I've found when it comes to libraries, C# has fewer, but higher quality than Java ones. I've also found C# to have a much more intuitive standard library. In C#, I can often just figure out how a standard library class works purely through the type system and the IDE, while in Java, I'd have to search through documentation more frequently.

Don't forget that once you get into it, VS is an awesome IDE.
Plug - https://github.com/manojlds/cmd

Shows some of the "dynamic" goodness of C#.

C# is a great language, I give you that. But the runtime is major turn off.
> A hypothesis I have on Ruby is that people attribute dynamic typing to it being a productive language, but that Ruby is actually productive for other reasons, in spite of being dynamically typed.

That has always been my reaction to most comments about dynamically typed scripting languages, including Python and Ruby. Most of the time, turning compile-time type errors into runtime exceptions is not a feature.

Yes, but the upside is you avoid the complexity headaches of interfaces and covariance/contravariance and generics and all that stuff.

Heck, look at all that FactoryFactoryFactory stuff you have to deal with when you want to swap out core parts of a framework - you end up with config files and XML and you have to make sure the guy who made the original framework designed it to allow you to change the part you want to change with your modular swap... in a dynamic language? Monkey patch. It's ugly, but it works.

Heck, look at serialization. If you want to serialize/deserialize static objects, you need metadata that includes the types of everything - stuff like XSD in XML. Dynamic languages don't need that stuff, which is part of the modern popularity of JSON... Javascript and its buddies just play nicer with JSON. I actually wish there was a popular simplified analogue to XSD for Json because I actually miss the ease of serializing into objects that you get using XML/XSD in C# or JSON in Javascript.

The dynamic-ish nature of exceptions that seem like an unholy abominatable hole in the type-system in static languages (or a source of unending-agony in Java's checked exceptions) suddenly fit nicely into a dynamic-typed language paradigm. Python embraces a "easier to ask forgiveness than permission" approach, throwing exceptions willy nilly and it makes nice clean code.

Plus, working in a dynamically typed language heavily discourages premature optimization because you already threw performance out the window.

But yeah, you're basically working without a net, and that kinda sucks.

What you describe is not a problem of statically typed languages in general, but of Java in particular.

Try Haskell for static typing done right.

Or any other language in the ML family.
> Yes, but the upside is you avoid the complexity headaches of interfaces and covariance/contravariance and generics and all that stuff.

Maybe it's a matter of what you are accustomed to but not having explicit declared types gives me headaches. I find code without types horribly unreadable - because I can't see at the first glance what data a function processes, etc.

Also dynamic typing and its runtime type checking gives me that uncanny feeling of "something might be wrong but I won't find out till I hit it".

It's absolutely a feature. It allows testing small parts of a system during a major code change when the rest of the system is broken.
You can easily do that in a statically typed language as well; you just have to do it explicitly. For instance, Haskell has the error function, of type "String -> a"; you can use it absolutely anywhere, no matter the expected type, give it an error message, and if the value it produces is ever actually used, it throws an exception with the specified message.
Also the trace module is pretty useful for the kind of dynamic-debugging practice people often do.
Don't discount the value of doing it implicitly. When you're just throwing stuff together (and there is a place for that), it's a distraction to work around parts of the system that won't work yet, both the typing and the mental overhead.
Then provide compile-time warnings---something still not done. Haskell's GHC now does -fdefer-type-errors
As well as applying patches to a running system without restarting it
Having done this both with Clojure and Erlang, I find the unversioned, dynamic mode of achieving this is fraught with trouble. It's really easy to end up in complex atypical states during upgrade which can at best be hard to debug and worst corrupt other, longer term state.
Exactly. I think what a lot of people miss is that erlang makes live upgrades possible but certainly not easy. Trying to upgrade a complex application one component at a time (the way erlang/otp promotes) is still insanely hard and not worth the pain unless you're working in a domain where you have absolutely no choice (which is of course why Ericsson developed erlang in the first place). For the vast majority of applications the right choice is to restart the server and accept a few seconds downtime.
I think with better documentation I'd actually call Erlang's model "easy" in that its really easy to do things correctly with absolutely minimum chance of corruption/failure/non-repeatability. OTP gives you the tooling—releases, versions thereof, and simultaneous multiple module tenancy being key—to actually do a good job. That's a rare thing.
There is essentially no reason why this should ever be needed.
Unless you're an Erlang or maybe Smalltalk programmer
I started out writing disagreements to your points, somehow having misread them as being arguments you support for Rubists preferring dynamic typing, but then during editing I re-read that you think it's productive in spite of dynamic types. I agree totally.

I don't think Java-style typing is that much of a hindrance. It's irritating boilerplate, but people using those languages can slam it out very quickly.

I don't think reasoning about runtime types is any more difficult than reasoning about compile-time times, it's in fact a higher cognitive load because you cannot ignore it and rely on a type-checking phase that covers all your paths without explicit test cases.

I personally found Ruby to be productive[1] due to the expressive metaprogramming, how easy it is to make DSLs, blocks and yield for CPS, generators, and co-routines, and how everything is re-definable. I don't know how much dynamic typing factors into that, but I think if you could get the same things with equally expressive syntax, Rubists would still like it.

[1] It used to be my favorite. I still like it (and love it for scripting), but prefer GADTs and pattern-matching on type constructors now.

Regarding Java, it is not about the language per se, but over-engineered pattern spaghetti frameworks.
> With Crystal, at least when it matures a bit more, this hypothesis could be tested.

You could test that now with http://rubyluwak.com/

RubyLuwak is statically typed with local type inference.

User-defined static types are a theory of a solution. But mostly we don't know what the hell we are doing, so we don't have a correct theory. And this is right, because instead of perfecting our theory, we should be adding the next new feature or new product.

Exceptions include frequently reused code (libraries, components, frameworks) and well-specified problems (rewriting a known problem, implementing an algorithm, shuttle-like high-risk projects). Here, static types are also useful as documentation.

As Brooks said: plan to throw one away; you will, anyhow. i.e. code to understand, then to solve. You don't understand it well enough to have a correct theory the first time, and it's less feasible to rewrite a project from scratch the larger it is.