|
I feel like the rope was too quickly discarded here. The author stopped exploring ropes because they don't fit criteria 2--efficient undo/redo. There are a few problems with claiming that ropes aren't efficient for this usage: 1. The claim that the rope is inefficient for undo/redo is based on a singular example of a small edit on short strings. This isn't where ropes shine, admittedly, but they don't need to shine there, because when dealing with such small pieces of data, pretty much anything you do will be faster than the user can see. If using larger strings, the space allocated for nodes becomes background noise as the strings themselves dominate the size. 2. The chosen solution, the piece table, is more memory-efficient than the rope at first glance, but that's a surface-level efficiency. The eventually-chosen solution, a piece tree, is far less memory efficient. Sure, at first glance it is more memory-efficient, but this is at the expense of tree traversals, which in the VSCode article, are addressed with cache, which... uses more memory. In the author's implementation there's even more memory used because there's a requirement he didn't include in his list: he wants it all to be immutable. Nevermind that ropes were immutable from the start... 3. If you have a document which uses a lot (read, thousands) of very small edits, then the size of small strings might start to matter. So if you're going to optimize for this, optimize it. There are some fairly small optimizations that make the inefficiency concerns completely irrelevant. One is pointer packing: in a 64-bit system, pointers are 64 bits, but in practice, the vast majority of systems use 48 or fewer of those bits: as it turns out, there aren't many systems with more than 2^48 bytes = 256 terabytes of RAM. This means the leading 16 bits are 0s. Trivially, this means you can store strings of 7 8-bit characters in the pointer itself, using the first 8 bits to signal if it's a string or pointer (if they're all 0s, it's a pointer) and the length of the string. All the strings in the inefficiency example can fit in a 64 bit integer: "Hello", " ", and "world" are all fewer than 7 bytes, which means you're passing around 64-bit integers, by value, with no allocations necessary. In fact, this means you can either append the " " to "Hello" like "Hello ", or prepend it to "world" like " world", and still stay under 7 bytes in either case. Remember, this is now 64 bit integers being passed around by value: this is far faster than piece trees. 4. The author treats undo/redo as a stack, but all the cool editors treat it as a tree. If you make a change, then undo it, then make another change, then undo the second change, is your first change lost? In vim/emacs, the answer is no: you can go back into a tree and find it to reapply. This means that all text is not only immutable but immortal: it has to be kept for the duration of the editing session. This enables a few more optimizations: we no longer need reference counts or garbage collection since we aren't reclaiming the memory, and now we can point into existing strings since they'll never change. Consider the following string: "The quick brown box jumps over the lazy dog." You may have noticed a typo: "box" should be "fox". This change requires 0 buffer allocations: we have a pointer to "The quick brown box jumps over the lazy dog." for the original string, a pointer to the same spot for "The quick brown " (with a length), a second pointer to "ox jumps over the lazy dog.", and a packed pointer (integer) for the string "f". This is pretty key because if you're not freeing any of this memory, you need to make sure you don't allocate more than necessary! NOTE: I'm not saying that the rope is the better structure here. There may be more requirements which weren't captured in the article which mean that piece buffers really are the right answer. All I'm saying is that the article doesn't really explore ropes deeply enough to write them off so quickly. |