Hacker News new | ask | show | jobs
by vidarh 1581 days ago
As I noted in another comment, I think a lot of complexity in many editors is because they worry about handling absurdly large files without thinking about whether they need to.

The moment you're willing to set an arbitrary limit above which you're ok with dropping performance, you can go very simple the way you did.

And that limit where things start to slow down can be far over the size of files most people need most of the time.

For my own part I'm fine with falling back to e.g. emacs the one time every few months I need do do something with an unusually large file.

2 comments

Realistically, unless you're browsing log files or some globbed together generated code, I would lay money that 95% of usage of programmers editors are under 2000 lines.

If you're optimizing for a 2 GB apache log, you're probably focusing on the wrong thing.

Exactly. I'd rather keep my editor simple than worry about use cases like that. Of course I'm happy some editors do the work to handle large files too, in part because that makes it ok for me to ignore it since I have fallbacks.
That, or, if you're looking at a single json response body...
You still can't avoid the performance penalty of a tree allocation for something like bracket pairing. Or AST analysis or any one of billion things people want from a code editor.
Depends on the analysis. For bracket pairing and quite a bit of analysis you can avoid it quite easily by storing the state of a parser at intervals. E.g.for syntax highlighting I use Rouge augmented with serialization of the lexer state, which also provides enough state for bracket matching and the internal state I need to store is typically one symbol every few lines.

For complex analysis of the code-base, sure, you may want to build an AST. For my part I have no interest in having that functionality in-process in the editor - I'd rather have that provided by an external service.

> For bracket pairing and quite a bit of analysis you can avoid it quite easily

How? You have a bracket `[` at start of 2GB file and the closing bracket `]` somewhere at end at end of the file.

Without analysis you can't tell where it is.

I think you're missing my point entirely. A 2GB file is exactly the kind of absurdly large file I wrote this about:

> The moment you're willing to set an arbitrary limit above which you're ok with dropping performance, you can go very simple the way you did.

and this:

> For my own part I'm fine with falling back to e.g. emacs the one time every few months I need do do something with an unusually large file.

Point being I have no interest in making my editor handle 2GB files, because I don't edit 2GB files. I may occasionally happen to want to look at a large log file or something, in which case I'm happy to use Emacs. Or grep.

To this you answered:

> You still can't avoid the performance penalty of a tree allocation for something like bracket pairing. Or AST analysis or any one of billion things people want from a code editor.

And at least for bracket pairing this is true if you want to apply it to 2GB files and want to be able to quickly find the closing token, but that was explicitly out of scope of what I'm talking about; I'd explicitly excluded large files from consideration in the comment you replied to.

For the types of files I'm interested in handling - up to tens of thousands of lines but beyond that I really don't care - just storing the parser state per line coupled with linearly scanning as needed is more than fast enough. Even in Ruby, which is what I'm using.

You'll note my objection was not to doing analysis, but to the need for keeping any more complex structure in sync over the data. My editor uses just a plain array of lines represented as strings.

It will break horribly on huge files.

That's perfectly fine.

The point of the comment you replied to was explicitly to argue that for some editors it'd be just fine to make the explicit design choice not to cater for extreme cases like that.

Of course I'm not arguing for all editors to do this. Then I wouldn't have anything to use on the rare cases I need it. I'm arguing for people to consider whether their editor need to able to handle all kinds of outliers; maybe it need to - Jetbrains probably do need their editor to for commercial reasons - maybe it doesn't; mine does not need to.

Instead I optimise for what makes it pleasant for me for my common case, and fall back to emacs the once every few months where I for some reason need to do something with an absurdly large file.

It's confusing you're replying to two different threads in this

> For my own part I'm fine with falling back > That's perfectly fine.

That's for you. I want to open a huge XML file and edit it with autocomplete.

I mean you could just say. I'm ok with Notepad. Ok. Sure. I'm not.

> My editor uses just a plain array of lines represented as strings.

Any naive approach is going to run into edge cases quickly[1]. Any more sophisticated approach is going to run into performance issues.

> It will break horribly on huge files.

I suspect it will break waaaay before that. Even using some language with greater complexity would cause issues.

[1] For example using regex to color syntax. Reasonably fast yes, but fails in any moderately complex scenario.

> It's confusing you're replying to two different threads in this

I get that, which is why I specified what it was I had replied to in order to clear it up.

> That's for you. I want to open a huge XML file and edit it with autocomplete.

That's fine. My editor is for me, not for you. To me, if you have to do bracket matching on a 2GB XML file an editor is the wrong tool. I'd work with that in a REPL. But that's me. You're totally free to pick your editor based on the need to handle 2GB files, just as I'm free not care one iota about opening 2GB files.

Hence my point that a lot of people don't seem to think about whether or not they actually need this. I decided I don't need it, and it let me simplify things a lot. It won't work for everyone, and that's fine.

> I mean you could just say. I'm ok with Notepad. Ok. Sure. I'm not.

Well, if I'd been ok with notepad, I'd have used notepad. I wrote my own editor because I'm not ok with Notepad. Or Emacs. The tradeoffs I care about are very different.

> Any naive approach is going to run into edge cases quickly[1]. Any more sophisticated approach is going to run into performance issues.

For large files, yes. It's more than fast enough to run just fine - even in pure Ruby - on files of the sizes I pointed out are the sizes I've decided I care about.

> [1] For example using regex to color syntax. Reasonably fast yes, but fails in any moderately complex scenario.

Which is why I'm using Rouge for the syntax-highlighting for my editor. It uses Regex for low level lexing where people choose to, but it supports state machines and multi-layered lexing or your own custom code. It supports all the languages I care about and dozens more just fine. My approach of caching the lexer state works just fine with Rouge and the approach I've outlined to e.g. parse my Ruby code with a precise Ruby mode, switch to a Markdown mode inside comment blocks, and feed the output of that into a layer that applies further custom rules based on my preferences. All without needing to build an AST for any of those.

> I suspect it will break waaaay before that. Even using some language with greater complexity would cause issues.

It handles every language supported by Rouge [1] just fine, and that's far more than I have need for.

[1] https://github.com/rouge-ruby/rouge/blob/master/docs/Languag...