Hacker News new | ask | show | jobs
by luckydude 1951 days ago
Wow, just noticed this. I'm the guy who paid for Little, a bunch of other people did all the work.

I'm surprised to see it getting some attention but happily so. Little is what I'd like C to evolve towards, there is a lot of useful (to me) stuff in the language.

I'll wander through the comments and reply where I can.

4 comments

I just realized I didn't give credit to all the people who worked on Little. So here goes. Tim Daly did the first pass. Oscar Bonilla stepped up, I still remember him saying to Tim "we need an AST" and Tim said I can do this. We needed an AST, Oscar was right about that and a lot of other things. Rob Netzer, my roommate from college and former Brown tenured prof, he did the most heavy lifting in the Little compiler. Damon Courtney was our GUI guy, he had huge influence in Little.

Jeff Hobbs from the tcl community helped as well. I have pictures of that group of people. Jeff helped a lot, he wanted this, I could say why but I don't want to speak for him.

Little is what I'd like C to be but those guys made it happen.

I'm amazed that Little got some traction, happy that it did for a moment, those guys deserve all the credit, I was a whiny dude wanted a more C like thing and they gave it to me.

I'm curious if you've written about the decisions around licensing that essentially killed the bitkeeper business by inspiring Linus to create git? What are your thoughts around that today?
Hind sight is 20-20. The BitKeeper business had a good run, we were around for 18 years. It made enough that I and my business guy are retired off of what we made.

On the other hand, we didn't make enough for everyone to retire if they wanted to. We had a github like offering and it's pretty clear that we should have put a bunch of money into that and open sourced BitKeeper.

All I can say is it is incredibly hard to make that choice when you have something that is paying the bills. I tried to get Sun to do it with the BSD based SunOS and they wouldn't. And even though I had that vision for Sun, when it was my livelihood, I couldn't see the path to doing so.

Shoulda, coulda, woulda, my biggest regret is not money, it is that Git is such an awful excuse for an SCM. It drives me nuts that the model is a tarball server. Even Linus has admitted to me that it's a crappy design. It does what he wants, but what he wants is not what the world should want.

It says a lot that we have a bk fast-export and we can incrementally run that and get idempotent results. As in go from BK to Git on an ongoing basis, have two people do it in parallel and they both get bit for bit identical results. If you try and go the other way, Git -> BK, if you do it in parallel you get different results because Git doesn't store enough information, so BK has to make up the missing bits.

Git has no file create|delete|rename history, it just guesses. That's my biggest regret, I wish Linus had copied that part.

Now that all that isn't so raw, I'd love to know how you felt about the other contenders that were floating about at the time. Were any of them doing things you wanted to see in BK?

I can't be the only one who'd be interested in your views on the general developments for such important tools of our trade. Have you written about it anywhere? I'll pre-order "Larry walks us from SCCS to the git monoculture".

Actually I was part of an SCM conference put together by Facebook and Google recently. People are starting to think about what happens after Git.

Unfortunately, even now, it seems that there is a lot catching up to BK still to be done. To be fair, we had kernel level programmers working on it, we don't think anyone will pick up our code, you pretty much have to be a top 1-2% programmer to work on it, it's all in very disciplined C, people don't seem to like that any more.

So far as I know, BK is the only system that gets files right, we have a graph per file, everyone else has a single graph per repository. The problem with that is the repository GCA may be miles away from the file GCA; BK gets that 100% right, other systems guess at the GCA. Graph per file means each file has a unique identifier, like an inode in the kernel. So create/delete/renames are actually recorded instead of being guessed at. SCM systems shouldn't guess in my opinion (actually in anyone with a clue's opinion, would you like it if your bank guessed about your balance? Of course not, so why is guessing OK in an SCM? It's not). Graph per file means that bk blame is instant no matter how much history you have.

BK is the only system that even attempts to get sub-modules (we call them components) right. Where by "right" I mean you can have a partially populated collection and you get identical semantics from the same commands whether it is a mono-repo or a collection of repos. Nobody else has anything close, Git sub-modules turn Gits workflow into CVS workflow (no sideways pulls).

I tried my best to show what we did in BK at that conference, I have no idea if they will swipe any of it. It's not like BK is perfect, it didn't do everything, no named branches, a clone is a branch, which is a model that absolutely will not scale to what people are doing today (we can argue whether TB repos should exist, but they do).

But for the problems BK did solve, it tended to solve them very well. Hell, just our regression tests are a treasure trove of things that can go wrong in the wild and we open sourced both the tests and the test harness.

Thanks, although I think you're demonstrating here and in the other comments why you should write a real history.

Was the conference recorded? I've tried searching, but I'm not turning anything up.

As an outsider you get my worthless full agreement on strictness of history, and on solving the monorepo or vendoring dilemmas. My employer at the time of the upheaval was a bitmover customer, and as we slowly switched away one repo at a time it definitely felt like a sideways step. I'd hesitate to say backwards because it did come with some big process improvements for us, but definitely not forwards.

I'd surely have been proud of solving problems with the quality that BK did too. I remember playing with a lot of the open source systems of the time¹, and none of them were in the same league. I'll make no apologies for this sounding like truly weird fan mail.

¹ I'm remembering hg, darcs, monotone, $some_implementation_of_arch, prcs, codeville but there was a lot of people in the space to some degree.

I think it was recorded, I'll go look.

Apologize for saying BK is quality? None needed, we prided ourselves on producing a quality product. And great support, our average response time, 24x7, was 24 minutes. It was only that "slow" because we were North America based. If you only considered the US work week, response time was usually under 2 minutes, but that's not reasonable because we had customers all over the world.

I'm gonna start with a write up of the SCCS weave, with a goal that it is enough of a spec that you could go implement it. Maybe add some notes about how I did it because the way I did it was unusual and had the side benefit that you could extract the GCA, left tip, and right tip for a merge in one pass.

What do you think about patch-based systems like Darcs and Pijul instead of snapshot-based systems like Git?

Recent article on Pijul: https://initialcommit.com/blog/pijul-version-control-system/

I think if you are asking this question either I have completely failed to explain why weaves are cool or you haven't read what I said about why weaves are cool.

Patch based systems are idiotic, that's RCS, that is decades old technology that we know sucks (I've had a cocktail, it's 5pm, so salt away).

Do you understand the difference between pass by reference and pass by value? You must, but in case you don't, you can pass by reference in sizeof(void *), 4-8 bytes. Pass by value and you are copying sizeof(whatever it is you are passing) onto the stack. Obviously, pass by reference is immensely faster.

But in SCM, it isn't just about speed (and space), it's about authorship. In a patch based systems, imagine that there is user A who is doing all the work on the trunk, there is user B who is doing all the work on a branch, and then there is user U who is merging the branch to the trunk. Lets say B added a bunch of work on the branch, it all automerged. U did the merge. In a patch based system, all of the B work is going to be copied (passed by value) to the trunk and the authorship of that work will change from B to U (since U did the merge).

Flip forward to a month from now, the code has paniced or asserted, whatever, B's code on the trunk took a crap. And people are running git blame to see who did that and who did it, U did. But U didn't, B did but U merged it and it was a copy so it looks like U did it.

That's just the SCM being dishonest because it has no choice, it is pass by value.

Weaves are pass by reference. If you merged in BitKeeper and it automerged, you run blame (we call it annotate but I should make blame be an alias if I haven't already, I'm the guy that came up with blame as that verb), you would only see A and B as the authors.

Weaves mean authorship is correct and that whole repack nonsense that Git does? Yeah, that goes away, you are passing every thing by reference so there is only one copy of the code no matter how many branches it has been merged from/to.

Anyone who is pushing a patch based system (and Git is one as well) just doesn't have a clue about how to do source management. Maybe something better than a weave will come along (and if it does, rbsmith will do it, that guy bug fixed my crappy weave implementation) but I think it will just be a better weave with new operators like MOVE (current weaves know INSERT and DELETE, that's it).

Sorry if I'm being a dick, not looking for sympathy but I've got health problems, my feet hurt like crazy and I get kind of terse at the end of the day. If you truly want to understand more, and this goes for all of hacker news, I'm happy to get on a zoom call and talk this stuff through. And it is blindingly obvious I need to write up the SCCS weave and I will do so, you guys have inspired this 58 year old, burned out, can't code to save his life, dude to at least try and pass on some knowledge. I would love to be working with some young person who has some juice and pass on what I know. I don't know everything about SCM but I know a lot. I'm done, it's time for someone else to carry things forward, I'll help if you want. The world deserves a better answer than what we have now.

> Unfortunately, even now, it seems that there is a lot catching up to BK still to be done. To be fair, we had kernel level programmers working on it, we don't think anyone will pick up our code, you pretty much have to be a top 1-2% programmer to work on it, it's all in very disciplined C, people don't seem to like that any more.

Oh my oh my.

Believe me, I would do backflips if someone wanted the BK source base, it's got almost 2 decades of my work in it, north of 140 man years. It's a lot to just let fade away.

But I assembled a team of people better and smarter than me, I did my best to keep the code simple but I didn't always succeed.

If you, or anyone, wants to pick it up, I'm happy to answer questions.

I know Git quite well at this point in my career, but it’s still a very baffling and often frustrating SCM tool in my opinion. I find it disconcerting that despite many years with it, I still feel like I don’t totally comprehend what really is happening under the hood. There seems to be a lot of magic. I still have a sense of apprehension every time I do a rebase, and after successful conflict resolution I am surprised that it somehow worked given how messy and convoluted the rebase workflow is.

One of the oddest things I more recently learned about Git is that you can’t create empty folders. The workaround is to create an empty file (e.g., a README)!

That being said, entire careers are built on this powerful yet odd tool that has somehow become the most popular SCM system in the world.

> One of the oddest things I more recently learned about Git is that you can’t create empty folders. The workaround is to create an empty file (e.g., a README)!

The convention I've seen is `.keep` and `.gitkeep`.

I so appreciate you answering questions here!

I've experienced how Git's lack of file rename makes it hard to follow changes through directory and file reorganization. This made me reluctant to reorganize, as I learned the problem space. The general, long-term impact may be increased ossification pressure.

OTOH linus's content addressable system is absurdly, compellingly simple, and does 90%. It's disproportionately simple. A dilemma!

BTW I think many companies get killed that way. Refusing revenue to profit must be pollyanna - except for github. OTOH keeping customers is how great companies get killed, according to Christensen (the innovator's dilemma).

I hadn't heard of BitKeeper but it sounds interesting. Have you considered open-sourcing it now for people to look at, even though the business has faded?
http://bitkeeper.org

Been open source for years

This makes me feel old. The bitkeeper saga is why GitHub exists. The drama is archived in 2004-5 lkml.
I was twelve :)
BTW, Linus switched to Git 2005. BitKeeper didn't give up and turn to open source until 2016. So we had a 10 year run after Git showed up where people still paid us.

It was good run, not many software companies get an 18 year run. I'm fine with it, would have liked to do more for my people.

Though, when we were shutting things down and I was bumming that I had not gotten retirement money for all of my people, one of them said something like "Dude, are you kidding me? My best friend barely knows his kids, he is out the door by 7am to fight Houston traffic and not home until close to 7pm. My commute is from my bedroom to my office down the hall. I've got to help my wife with the kids, I see the kids all day every day, my life is infinitely better than my friend's life and you gave that to me. You're fine."

I'm a wimp, I teared up a bit, that was the nicest thing he ever said to me. It's not just about money.

I just wanted to say thank you for believing in the SCCS weave when making BitKeeper. It is an incredibly elegant design (though I've failed to properly implement it myself yet).
The SCCS weave was our secret sauce. Tichy did the world a huge disservice in his PhD about RCS (like that was worth a PhD, come on) where he bad mouthed SCCS's weave (without understanding it, or maybe he was spreading misinformation on purpose, he implied that SCCS was forward deltas where RCS is backwards deltas, see below for what that means).

When we were still in business, each new SCM that came out, we'd hold our breath until we looked at it and said "No weave!"

For those who don't know, the SCCS weave is how your data is stored. Most people are used to something like RCS which is patch based. For the RCS trunk, the head is stored as plain text, the previous delta is a reverse patch against the head, lather, rinse, repeat. Branches are forward deltas, so if you want to get something on a branch, you start with the head, apply reverse deltas until you get to the branch point and then forward deltas until you get to the branch tip. Ask Dave Miller how much he loved working on a branch of gcc, spoiler, he hated it. With good reason.

SCCS has a weave that is not aware of branches at all, it only knows about deltas. So you can get 1.1 in exactly the same amount of time as it takes to get head, it is one read through all the data. bk annotate (git blame) is astonishingly fast.

And merges are by reference, no data is copied across a merge. Try that in a patch based system. How many of you have been burned because you merged in a branch full of code that someone else wrote, and on the trunk all the new data from the branch looks like you wrote it, so you get blamed when there is a bug in that code? That's because your brain dead system copied code from the branch to the trunk (Git does this as well, that's what the repack code is trying to "fix", it is deduping the copies).

Weaves are the schnizzle, any SCM system that doesn't use them today is so 1980.

How does dcfs weave compare or relate to what pijul.org is doing?

Some recent discussion of pijul on HN:

https://news.ycombinator.com/item?id=24592568

It's based on patches. I'm perhaps not going to win any friends, but I think that is ill-advised. Patches are great for emailing around, they are a horrible idea for an SCM.

Perhaps I need to write up how weaves work in more detail. Once you get that, you won't want an SCM based on anything else.

By all means, please do. There's a severe lack of material.
Yes, please do! The SCM space is horrible right now. I would love to see prior art.
Avocet (CodeManager) ruined me for other source control systems for years and years — I still look back on it fondly.
That was me as well, though I wrote it in perl4 and C. My version was called NSElite. The Solaris kernel was developed under NSElite, some stuff is here:

http://mcvoy.com/lm/nselite

2000.txt documents the first 2000 resyncs (think bk pull) of the kernel.

Avocet was what you got when you took all my perl code and handed it to the tools group and they rewrote it in C++ (which they later admitted was a horrible idea). The only thing of mine that they kept was smoosh.c and that was because not a single one of them had the chops to write that code (yeah, there was no love lost between me and the tools group).

BitKeeper is what Avocet could have been if Sun had not stopped me from doing any more work on NSElite (I was 1 guy who was coding circles around 8 tools people and they didn't like it). Shrug. C++ was just wrong, perl4 was just way faster to code in, and when I needed performance I coded in C. It's not my fault they picked the wrong way to go about things. (That, BTW, was the first time I ever personally saw that you really can have one guy who can do the work of 8, almost, but not quite a 10x programmer :-)