Hacker News new | ask | show | jobs
by twotwotwo 2076 days ago
So I cloned the mysql-server and postgres repos and ran sloccount. It's not the deepest dive or anything but was interesting.

I saw MySQL had...600k lines of JavaScript?

It turned out that the storage/ndb directory had a Web-based management interface for NDB, which vendors in the Dojo JavaScript framework. It also had ~50k lines of Java for the "ClusterJ" framework, which interfaces with NDB skipping the SQL layer. Overall, sloccount reports about 1.4M lines of code in storage/ndb/.

NDB is a specialized cluster database where all secondary indexes have to fit in the cluster's RAM(!). Work on it started at Ericsson, then it was spun out into a startup which MySQL AB eventually bought. I imagine MySQL management at the time hoped the future of DB clusters might end up looking more like NDB and less like...gestures at today's database landscape.

There are also ~500K lines of Unicode tables in the strings/ directory. I recall Postgres calls out to libc for locale/collation related stuff so probably doesn't need those tables in-tree.

Even accounting for those chunks you still end up with ~1M vs ~2M SLOC as measured by sloccount. (I don't want to pretend the numbers are super precise.) There are probably other differences in what's in scope for the repo or other surprises. [Edit: see johannes1234321's comment which lists some of them.]

Besides those, though, might be truth to others' comments about MySQL spending lots of code supporting drastically different old and new "worlds" in a single binary (non-transactional and transactional storage, originally very-nonstandard vs. currently more-standard SQL, statement-based and row-based replication...). And at a totally non-technical level, as a product MySQL seems to have had more money thrown at it and that tends to mean more code.

This was fun but was an incredibly quick and dirty dive into it, and I'd love to hear more from folks who can look more or just know more.

3 comments

NDB is a specialized cluster database where all secondary indexes have to fit in the cluster's RAM(!).

That’s not that wild of a design principle: It’s been longstanding best practice to scale OLTP databases (and limit indexing) as to keep secondary indexes buffered in memory.

To explain that statement: Historically ndb worked in a way that all data had to be in memory and it wouldn't touch the disk at all. For a while now data can be on disk, but all indexes afaik still have to be fully loaded into memory in startup.

The use cases are systems where you need "five nines" of uptime and fat responses. Coming out of Ericsson the classic area where it is used is in Telco (for instance "home location registers", the database recoding in which cell a mobile phone currently is, often use ndb) but there are different usages in Web (i.e. Session Store), "real time" information exchange (betting, gaming, ...) and so on.

It is not as easy to administer as a "normal" MySQL, but when deployed carefully it is powerful, fast and scalable (both locally as well as geo l-level)

postgresql can use ICU and on some platforms it's the only way to get utf8 collations.
I was watching Sherlock Holmes tonight and for some reason I read that in Benedict Cumberbatches voice, most entertaining, thank you!