Hacker News new | ask | show | jobs
by SamReidHughes 1581 days ago
A rope could frequently call malloc, or cause more time spent garbage collecting, and it takes up more memory, and the code is slower, especially in the 80's.

Before modern JavaScript VMs, I made an in-browser text editor, and I tried using a finger-tree data structure with string segments. It was extremely slow. I replaced it with two strings. Then it ran at human speeds. Memcpying the whole string upon a keypress was faster than some fancy data structure.

1 comments

As I noted in another comment, I think a lot of complexity in many editors is because they worry about handling absurdly large files without thinking about whether they need to.

The moment you're willing to set an arbitrary limit above which you're ok with dropping performance, you can go very simple the way you did.

And that limit where things start to slow down can be far over the size of files most people need most of the time.

For my own part I'm fine with falling back to e.g. emacs the one time every few months I need do do something with an unusually large file.

Realistically, unless you're browsing log files or some globbed together generated code, I would lay money that 95% of usage of programmers editors are under 2000 lines.

If you're optimizing for a 2 GB apache log, you're probably focusing on the wrong thing.

Exactly. I'd rather keep my editor simple than worry about use cases like that. Of course I'm happy some editors do the work to handle large files too, in part because that makes it ok for me to ignore it since I have fallbacks.
That, or, if you're looking at a single json response body...
You still can't avoid the performance penalty of a tree allocation for something like bracket pairing. Or AST analysis or any one of billion things people want from a code editor.
Depends on the analysis. For bracket pairing and quite a bit of analysis you can avoid it quite easily by storing the state of a parser at intervals. E.g.for syntax highlighting I use Rouge augmented with serialization of the lexer state, which also provides enough state for bracket matching and the internal state I need to store is typically one symbol every few lines.

For complex analysis of the code-base, sure, you may want to build an AST. For my part I have no interest in having that functionality in-process in the editor - I'd rather have that provided by an external service.

> For bracket pairing and quite a bit of analysis you can avoid it quite easily

How? You have a bracket `[` at start of 2GB file and the closing bracket `]` somewhere at end at end of the file.

Without analysis you can't tell where it is.

I think you're missing my point entirely. A 2GB file is exactly the kind of absurdly large file I wrote this about:

> The moment you're willing to set an arbitrary limit above which you're ok with dropping performance, you can go very simple the way you did.

and this:

> For my own part I'm fine with falling back to e.g. emacs the one time every few months I need do do something with an unusually large file.

Point being I have no interest in making my editor handle 2GB files, because I don't edit 2GB files. I may occasionally happen to want to look at a large log file or something, in which case I'm happy to use Emacs. Or grep.

To this you answered:

> You still can't avoid the performance penalty of a tree allocation for something like bracket pairing. Or AST analysis or any one of billion things people want from a code editor.

And at least for bracket pairing this is true if you want to apply it to 2GB files and want to be able to quickly find the closing token, but that was explicitly out of scope of what I'm talking about; I'd explicitly excluded large files from consideration in the comment you replied to.

For the types of files I'm interested in handling - up to tens of thousands of lines but beyond that I really don't care - just storing the parser state per line coupled with linearly scanning as needed is more than fast enough. Even in Ruby, which is what I'm using.

You'll note my objection was not to doing analysis, but to the need for keeping any more complex structure in sync over the data. My editor uses just a plain array of lines represented as strings.

It will break horribly on huge files.

That's perfectly fine.

The point of the comment you replied to was explicitly to argue that for some editors it'd be just fine to make the explicit design choice not to cater for extreme cases like that.

Of course I'm not arguing for all editors to do this. Then I wouldn't have anything to use on the rare cases I need it. I'm arguing for people to consider whether their editor need to able to handle all kinds of outliers; maybe it need to - Jetbrains probably do need their editor to for commercial reasons - maybe it doesn't; mine does not need to.

Instead I optimise for what makes it pleasant for me for my common case, and fall back to emacs the once every few months where I for some reason need to do something with an absurdly large file.

It's confusing you're replying to two different threads in this

> For my own part I'm fine with falling back > That's perfectly fine.

That's for you. I want to open a huge XML file and edit it with autocomplete.

I mean you could just say. I'm ok with Notepad. Ok. Sure. I'm not.

> My editor uses just a plain array of lines represented as strings.

Any naive approach is going to run into edge cases quickly[1]. Any more sophisticated approach is going to run into performance issues.

> It will break horribly on huge files.

I suspect it will break waaaay before that. Even using some language with greater complexity would cause issues.

[1] For example using regex to color syntax. Reasonably fast yes, but fails in any moderately complex scenario.