Hacker News new | ask | show | jobs
by lelanthran 392 days ago
> The core data structure (array of lines) just isn't that well suited to more complex operations.

Just how big (and how many lines) does your file have to be before it is a problem? And what are the complex operations that make it a problem?

(Not being argumentative - I'd really like to know!)

On my own text editor (to which I lost the sources way back in 2004) I used an array of bytes, had syntax highlighting (Used single-byte start-stop codes for syntax highlighting) and used a moving "window" into the array for rendering. I never saw a latency problem back then on a Pentium Pro, even with files as large as 20MB.

I am skeptical of the piece table as used in VS Code being that much faster; right now on my 2011 desktop, a VS Code with no extra plugins has visible latency when scrolling by holding down the up/down arrow keys and a really high keyboard repeat setting. Same computer, same keyboard repeat and same file using Vim in a standard xterm/uxterm has visibly better scrolling; takes half as much time to get to the end of the file (about 10k lines).

2 comments

From what I have experienced the complex data structures used here are more about maintaining responsiveness when overall system load is high and that may result slightly slower performance overall. Say you used the variable "x" a thousand times in your 10k lines of code and you want to do a find and replace on it to give it a more descriptive name like, "my_overused_variable," think about all of the memory copying that is happening if all 10k lines are in a single array. If those 10k lines are in 10k arrays which are all twice the size of the line you reduce that a fair amount. It might be slower than simpler methods when the system load is low but it will stay responsive longer.

I think vim uses a gap structure, not a single array but don't remember.

I am not a programmer, my experience could very well be due to failings elsewhere in my code and my reasoning could be hopelessly flawed, hopefully someone will correct me if I am wrong. It has also been awhile since I dug into this, the project which got me to dig into this is one of the things which got me to finally make an account on hn and one of my first submissions was Data Structures for Text Sequences.

https://www.cs.unm.edu/~crowley/papers/sds.pdf

VS Code used 40-60 bytes per line, so a file with 15 million single character lines balloons from 30 MB to 600+ MB. kilo uses 48 bytes per line on my 64-bit machine (though you can make it 40 if you move the last int with the other 3 ints instead of wasting space on padding for memory alignment), so it would have the same issue.

https://github.com/antirez/kilo/blob/323d93b29bd89a2cb446de9...

> a file with 15 million single character lines

I have never seen a file like this in my life, let alone opened one. I'm sure they exist and people will want to open them in text editors instead of processing with sed/awk/Python, but now we're well into the 5-sigma of edge cases.