Would be nice to redo this experiment with modern languages like Rust, Go, as well as modern "flavors" of Haskell and C++. Maybe throw OCaml in as well.
Also be interesting (but impossible) do this with more complex problems. I work on more than 10 million lines of C++ (large, but there are many larger C++ codebases out there), with much of the code going back 15 years (comparatively young), with several hundred developers . Even if Haskell could do this in 1 million lines of code (seems unlikely, but who knows), that is still a lot of code. Does it have the abstraction needed to handle this, or does something fail and haskell becomes unmaintainable for some reason?
Which is to say this is interesting, but it is a microbenchmark and so of questionable relevance to the real world.
I work on one of the largest Haskell codebases in the world that I know of (https://mercury.com/). We're in the ballpark of 1.5 million lines of proprietary code built and deployed as effectively a single executable, and of course if you included open source libraries and stuff that we have built or depend on, it would be larger.
I can't really speak to your problem domain, but I feel like we do a lot with what we have. Most of our pain comes from compile times / linking taking longer than we'd prefer, but we invest a lot of energy and money improving that in a way that benefits the whole Haskell ecosystem.
Not sure what abstractions you are wondering about, though.
What I'm wondering about is how maintainable programs of that size are over time. That you get get over a million lines says it is possible. However difficult is it though? Abstractions are just code for whatever it is needed to break your problems up between everyone without conflicts. How easy/hard is this?
For example, I like python for small programs, but I found around 10-50k LOC python no longer is workable as you will make a change not realizing that function is used elsewhere and because that code path isn't covered in tests you didn't know about the breakage until you ship.
It’s highly scalable. Part of the reason compile times are a bit long is that the compiler is doing whole program analysis.
Most of the control flow in a Haskell program is encoded in the types. A “sum type” is a type that represents choices and they introduce new branches to your logic. The compiler can be configured to squawk at you if you miss any branches in your code (as long as you’re disciplined to be wary about catch-all pattern matches). This means that even at millions of lines you can get away with refactorings that change thousands of lines across many modules and be confident you haven’t missed anything.
You can do these things in C++ code based as well but I find the analysis tooling there is building models where in Haskell the types are much more direct. You get feedback faster.
We have a pretty limited set of abstractions that are used throughout. We mostly serve web requests, talk to a PostgreSQL database, communicate with 3rd-party systems with HTTP, and we're starting to use Temporal.io for queued-job type stuff over a homegrown queueing system that we used in the past.
One of the things you'll often hear as a critique levelled against Haskell developers is that we tend to overcomplicate things, but as an organization we skew very heavily towards favoring simple Haskell, at least at the interface level that other developers need to use to interact with a system.
So yeah, basically: Web Request -> Handler -> Do some DB queries -> Fire off some async work.
We also have risk analysis, cron jobs, batch processing systems that use the same DB and so forth.
We're starting to feel a little more pain around maybe not having enough abstraction though. Right now pretty much any developer can write SQL queries against any tables in the system, so it makes it harder for other teams to evolve the schema sometimes.
For SQL, we use a library called esqueleto, which lets us write SQL in a typesafe way, and we can export fragments of SQL for other developers to join across tables in a way that's reusable:
select $
from $ \(p1 `InnerJoin` f `InnerJoin` p2) -> do
on (p2 ^. PersonId ==. f ^. FollowFollowed)
on (p1 ^. PersonId ==. f ^. FollowFollower)
return (p1, f, p2)
which generates this SQL:
SELECT P1., Follow., P2.*
FROM Person AS P1
INNER JOIN Follow ON P1.id = Follow.follower
INNER JOIN Person AS P2 ON P2.id = Follow.followed
^ It's totally possible to make subqueries, join predicates, etc. reusable with esqueleto so that other teams get at data in a blessed way, but the struggle is mostly just that the other developers don't always know where to look for the utility so they end up reinventing it.
In the end, I guess I'd assert that discoverability is the trickier component for developers currently.
I worked at SimSpace, we had a million lines of Haskell written in house. It was wonderful! It was broken up into 150-175 packages with a surprisingly shallow dependency tree, making compile times decent.
It helped that our large application was a bunch of smaller pieces that coordinated through PostgreSQL.
We had three architects who spent their time finding near future problems and making sure they didn't happen.
I've had Haskell jobs with smaller and worse codebases. I think bad code can be created in any language.
I agree, but good code bases do need language support. Some languages cannot easially scale to large code sizes (dynamic types, self modifying code, and other such things that Haskell doesn't have most come to mind as why I've given up on some languages for large code bases - but there may be other things I don't know of that make languages not work at large sizes)
Also JVM languages, Java, Kotlin, Clojure, now that they have added functional features. Plus functional dynamic languages like Scheme/Racket and Elixir.
Which is to say this is interesting, but it is a microbenchmark and so of questionable relevance to the real world.