I was reading on ropes a couple of months ago when researching emacs internals. Apparently emacs and vs code are using gap buffers. Would be interesting to know the reasons for the decisions.
EMACS was just a package on top of TECO (the way LaTeX sits on top of TeX). Later re implementations (e.g. GNU emacs) just continued to use the same design.
So why did TECO use a gap buffer?
The gap buffer was an easy way to manage an edit buffer back when machine clock speeds were measured in kilohertz and RAM in kilobytes. There were no fonts (no rendering at all), six or seven bit characters, oh, and the machines were often timeshared.
Likewise, vi is just a visual addition to ed, itself a clone of the Multics qed, itself a clone of qed on earlier machines. Those machines were, by modern standards, equally resource starved.
> Its really kind of fascinating to me, all the different ways we've come up with over the years just to manipulate text.
One of the things that strikes me is how much effort goes into making these editors work well with absurdly large files, rather than more editors punting on that and having people fall back on specific tools for huge files.
For my own editor I basically decided to ignore large files entirely and fall back on using emacs for the very rare case where I need to open an absurdly large file.
I know that's a luxury Jetbrains doesn't have, because we've come to expect all editors to handle ridiculous sized files well.
But the point being that for reasonably sized files - up to tens of thousands of lines - just an array of strings is more than fast enough.
Even with an editor like my personal one (not really usable for anyone else, though I've started packaging up parts of the code) written in Ruby (which introduces a substantial overhead per string).
I think if I ever decide to make my editor handle really huge files, I'll "just" split them in suitably large chunks and lazily do the necessary processing as needed
> I think if I ever decide to make my editor handle really huge files, I'll "just" split them in suitably large chunks and lazily do the necessary processing as needed
That falls apart the moment you need some non-basic feature like matching open/closed brackets/elements, etc.
I think that missed my point. If a file is large enough that this becomes an issue, it is tens of thousands of lines or more, which means it's rarely human written code. I'm perfectly happy to turn off anything fancy in that scenario, as on the rare (every few months at most) occasions where I open such monstrously large files it's usually a log file or similar, not code. Your mileage may well vary, but I'm not interested in writing a general purpose editor (some components of my editor are general purpose, and I'm packaging up some of them, but the editor itself is written entirely with my own usage patterns in mind - my editor is smaller than my .emacs file used to be). I think more people ought to focus on writing more opinionated editors rather than try to make everyone happy.
That said, it's not true that it needs to affect features like the ones you mentioned - all you need is to add a facade that gives your tools whatever interface to the buffer they need. As it is, my editor stores its buffer in a separate server process, because it was trivial to do so and gave me a bunch of benefits like multiple clients connecting to the same buffer, which also means I can trivially have out-of-process services augmenting the buffers with additional state lazily without needing to know anything about how the buffers are represented. The server process + a facade for the current buffer implementation + most of the basic editing operations the rest is built on is ~500 lines of code.
The data structures used for text editing are important but only a very small part of what makes ST fast. It's the native, gc-less code, the custom UI toolkit and constant attention to performance that pull that weight.
I see no reason why a rope would be slower than a gap buffer for 'just typing'. And the gap buffer will choke when you fill it up and want to continue typing.
A rope could frequently call malloc, or cause more time spent garbage collecting, and it takes up more memory, and the code is slower, especially in the 80's.
Before modern JavaScript VMs, I made an in-browser text editor, and I tried using a finger-tree data structure with string segments. It was extremely slow. I replaced it with two strings. Then it ran at human speeds. Memcpying the whole string upon a keypress was faster than some fancy data structure.
As I noted in another comment, I think a lot of complexity in many editors is because they worry about handling absurdly large files without thinking about whether they need to.
The moment you're willing to set an arbitrary limit above which you're ok with dropping performance, you can go very simple the way you did.
And that limit where things start to slow down can be far over the size of files most people need most of the time.
For my own part I'm fine with falling back to e.g. emacs the one time every few months I need do do something with an unusually large file.
I'm just saying that "gap buffer" is not a reason any editor is perceived to be slow unless its slow when moving the cursor in a large buffer (or as you point out, the amortized growth of a large buffer, which will happen max log n times in a buffer of size n)
> amortized growth of a large buffer, which will happen max log n times in a buffer of size n
You can do better than log if you care to; just grow the buffer more quickly. It's just that exponential growth tends to work pretty well (and, notably, amortizes the overhead of copying such that it is O(1)). Also: people tend to take breaks when typing, so you can asynchronously try to grow the buffer when it is close to full.
So why did TECO use a gap buffer? The gap buffer was an easy way to manage an edit buffer back when machine clock speeds were measured in kilohertz and RAM in kilobytes. There were no fonts (no rendering at all), six or seven bit characters, oh, and the machines were often timeshared.
Likewise, vi is just a visual addition to ed, itself a clone of the Multics qed, itself a clone of qed on earlier machines. Those machines were, by modern standards, equally resource starved.