Hacker News new | ask | show | jobs
by 8192kjshad09- 1331 days ago
Really interesting story, approximately how many lines of code were in that codebase?

It's hard for me to imagine someone grokking a 10M+ line codebase without external help, but I've never tried it. I do agree with the assertion that most codebases are not as _special_ as they like to think.

2 comments

This was just over 600k of mostly c++ code. It’s certainly true that it helped I was familiar with the domain and the various technologies they had used, like CORBA and xml, this was late 90s.

10M is a pretty massive codebase like the entire linux kernel with all drivers is somewhere in that size. Most corporate systems aren’t that big and even for Linux you wouldn’t need to understand all drivers to understand the core kernel, I suspect the core kernel is maybe max 1M.

For reference, Facebook's android Messenger app is about 10M lines of code:

https://engineering.fb.com/2022/10/24/android/android-java-k...

That is interesting. I wondering how the 15 people that had created the ~600K c++ codebase I’m talking about compares to the FB headcount on android Messanger, does anyone know how big that team is? loc/head is a curious measure.

Regardless it’s a bit concerning that it takes 10M loc for a messaging app.

Does lines of code have any meaningful use as a statistic when I can simply include a bunch of libraries and headers to inflate it?
Not forgetting of course that Twitter almost certainly uses several languages on the backend, and has it entwined in their infrastructure. As TFA says:

"One former Tesla engineer, who spoke on the condition of anonymity to candidly describe the matter but was not involved, said Tesla engineers would have trouble capably assessing Twitter’s code. Distributed systems, the large-scale and spread-out network that Twitter is composed of, are not the automaker’s specialty, the person said."

9M of that is probably localization files, 500k licenses

:P

A lot of the android code at Facebook is auto generated boiler plate.
I'm not doubting your story, but this is not the norm in my experience.

    It’s certainly true that it helped I was 
    familiar with the domain
Technical prowess and domain knowledge are excellent assets, obviously, but in my experience they're often not enough.

The big tangled enterprise codebases I've dealt with (insurance companies, fintech, construction, etc) involved absolute metric tons of undocumented domain knowledge and lots of company-specific "tribal knowledge." Some tribal knowledge was embedded in the code in undocumented or semi-documented form, and much existed outside the codebase entirely... all kinds of custom infrastructure, etc.

I don't care how sharp and domain-familiar a team is. That sort of situation is not easily tameable.

Why would anyone think Twitter has 10 million lines of code? Does it have some type of hidden features etc that I am not aware of?
Why would you think that it doesn’t have 10 million lines of code?
Because that's a ludicrous amount of code for almost anything and Twitter has a relatively limited scope.