Hacker News new | ask | show | jobs
by Koshkin 3562 days ago
Just curious, why would one want to open a 10MB (let alone a 1GB) file in a text editor? Isn't that something that could be better (that is, more efficiently) handled by the likes of grep, head, tail, sed or awk? It's like issuing a database query that returns a million rows when a more specific query would have been a better choice because it would result in substantially less data to look at.
7 comments

The answer is going to sound terribly boring, but as someone who frequently deals with very large files of structured text: sometimes you just want to look at your data and quickly jump around. I want direct manipulation, not a data querying/manipulation DSL (whether that's grep, sed, SQL, etc)
I wholeheartedly agree that it doesn't sound like much, but sometimes all I'm doing is applying general, human intelligence to the very vague question of, Is there anything in here that I'm not aware of but should be? That's not an easy query to write in grep or SQL, but it's a common query that often turns out to be important, so I handle it the old-fashioned way: I look it over with human eyes and see what I find.
Here's a "me too!"

I spend a lot of time less-ing around in log files, but then every once in a while I need or at least want the full power of an editor to act on those files. Be it converting one kind of line delimiter to another so I can visually understand the structure of strange messages, excerpting blocks of the log to files,... lots of things I can do easily in vim (and probably could do in ST if I had ST in my environment) and not nearly so convenient otherwise.

I don't know why you are being downvoted, it seems like a legitimate question to me.

I routinely edit text files that are hundreds of megabytes, and sometimes over a gig. These are usually transcripts from chip simulations, where I'm tracking down where something went wrong.

Most often I do this in emacs, and sometimes vim. They both do fine with it, although emacs used to warn about files over a certain limit. It has always done fine with them in practice.

Lately, HN down-votes seem to me have a very "boo boy" thing about them. There's nothing in the original comment that would warrant a down-vote.
Upvoting you for that. I've noticed a change too; it's reduced the amount I comment.
I used the SQLite source file when working on the C syntax rewrite a few months ago. It is almost 7MB.

Being able to jump to the definition of a function, even when contained in a source file that is 200k lines can be handy.

Oh man, you should take a look at NIH gene data someday if you really want to blow a gasket. Pretty much everything is over 1GB, and there are many, many files to work with when you're really only looking for a very specific subset of the data. I remember helping a friend of mine who's a grad student with them for his research.
For starters, if you'd want to eyeball a particular location in e.g. a 100mb data file that's not plain text but common text-ish formats such as xml or json, then you'd want prettyprint/reindent and syntax highlight; and using search/replace regex interactively gives much better immediate feedback about results (i.e. if this is what you wanted) than doing the same with grep/sed.
In my case: logs, when I'm troubleshooting something, but not sure what I'm looking for :)
I found myself doing this more and more often as a data provider that most of our clients used started to drop the ball with the data they were distributing. Including things like extra comma's in descriptions, letters in monetary figures etc. I actually found Notepad++ the fastest for opening CSV's, XML etc. Nipped this in the bud quickly and made a small windows GUI that scans source files for common issues.