Hacker News new | ask | show | jobs
by martanne 3564 days ago
A number of people expressed the need to edit large files. For the development of my own editor[0] I would be interested to know what kind of usage patterns most often occur. What are the most important operations? Do you search for some (regex) pattern? Do you go to some specific line n? Do you copy/paste large portions of the file around? Do you often edit binary files? If so what types and what kind of changes do you perform?

[0] https://github.com/martanne/vis

3 comments

1) Going to a specific location (e.g. a location that shows up in some error log about processing that data file) and eyeballing it. Being able to go to a specific location is sometimes important (e.g. row 12873, character 233). Syntax highlight is important, it sometimes makes obvious something that's subtly malformed. Syntax highlight that doesn't take an eternity for large files is a hard issue.

2) regex search/replace - interactive grep/sed.

3) Very large edits - e.g., find a specific location and remove all data entries before that so that the problematic entry now would be the first one; essentially cutting away half of a very large file.

4) Do note that you might have very, very large lines - it's not that uncommon to have the whole file in a single line, e.g. non-pretty-printed json data. Some editors work well with large files but simply die if there's a line with a million characters.

Thanks for the feedback!

Yes syntax highlighting for large files is a hard issue. I'm not really aware of an accurate an high speed solution supporting editing operations in huge files.

In principle the underlying data structure used by vis supports all modifications with linear complexity in the number of editing operations since file load. This is independent of the file structure (i.e. single line files should be well supported). However the frontend code hasn't yet been optimized so in practice there might be some problems.

Unless one specifies the blackhole register when deleting large parts of a file this will create an in memory copy (to enable later pasting at a different location). Better would be to keep a reference to the existing immutable text region.

Another thing I sometimes do:

Open a medium sized file in some format (CSV for example)

Select all

Change the selection to individual selections, one per line.

Edit in parallel, doing the same edit to all lines.

When the file is not that big, this sequence of actions is amazingly fast in ST.

Just for backup on the large lines thing:

Many years ago, one compelling feature about Lugaru's emacs-family Epsilon (recently discussed here on HN) was the fact that you could quickly load _any_ file, with maximum sizes several times the available system memory, and completely regardless of text structure. You could load a fat binary, e.g. WORD.EXE, edit character strings in it, save it, and if you carefully avoided changing sizes and offsets, have a still-working .EXE program.

This is of course a slightly off-the-wall use case for a text editor, but if you happen to need that kind of thing you'll be really grateful if you have an editor that can do it.

I'm a bit hesitant to burden you with feature suggestions - sidetracking the developer(s) can easily kill many small projects. But I'd like to mention a feature I would find compellingly useful that I have yet to see in "modern" editors. Possibly this could be a distinguishing feature that gives you a niche!

IBM's mainframe editor ISPF allowed you to select a set of lines, usually based on a search (including negative search, i.e. lines _not_ containing the search data), then to manipulate the set of lines thus selected (manually removing lines, adding lines or reversing the selection) and then performing other operations, such as global search and replace, or sorting, or indenting or whatever, on that set of lines while ignoring all other text in the file.

I occasionally run into tasks where I would love to have this functionality available.

These operations are already supported by using structural regular expressions. As an example

    x g/foo
will select all lines containing foo. Similarly

    x v/foo
will select all lines not containing foo. Sorting etc. is taken care of by piping text through external tools.

I was more interested in common editing tasks for huge files which according to this thread a lot of people perform using sublime text.

ST can do all these things. Search by regex, the "Find All" command inserts a cursor selecting each result, then you can do all the usual text editing operations on each selection in parallel. It has sorting and the like built in, and you can do regex find and replace inside the selections, etc.
for me, regex find and find-in-files is very important for analysing large codebases. Sublime is close to ideal for this (for me). (Multi-line regexes also very useful).

Don't tend to copy large portions of files around, but goto-line is used fairly often and some kind of programatic interface (REPL-style) would also be good, similar to the Sublime console.

Plugins are also important for the long-tail features that are very niche.

...but... speed and lightness is also very important.... so don't use JS/CSS/HTML... please.