Hacker News new | ask | show | jobs
by VieElm 4222 days ago
Why do you have text files that are 200MB? Why don't you separate that into multiple files? 200MB of text? Moby Dick is 1MB in plain text. The Holy Bible is like 4MB in plain text. What is in your files? I think you may be doing something wrong.
4 comments

We deal with data sets that are several GB in size, and so I'm already using just a subset of the data. These files are loaded by programs into memory and serve customers via a REST API, and it's not uncommon that we have to manually inspect or change the files from time to time.

Our use of large text files isn't only internal; our software also processes delimited text files for customers, and these can run into the same size ranges, and it's out of our control. But when the service bombs, we need to be able to prove to them that their file is malformed, and if it isn't, we have to fix our data or our code. That usually requires inspection of text files.

But that's all beside the point. This looks like a fine source editor for basic needs, but I wouldn't consider it a tool for general "plain-text" editing like it says on the site.

Any particular reason you're not using a database?
There are several. And even if we did, you still need a well-formed text file to parse before you can insert it into the database.
I'd put money on that being a data file.
Whenever possible, the machine should accommodate the human rather than vice versa. Viewing a 200MB file in a text editor is not an outrageous request on a modern computer. Maybe he's not working optimally, but why should he change if his approach should work, and does work fine with other tools?

    s/I think you may be/You are/
No single file of source code should ever be even close to 200MB, and no data file (CSV, XML, etc) of that size should be manually edited in a text editor.

Wrong tool for the wrong job.

"no data file (CSV, XML, etc) of that size should be manually edited in a text editor"

A new law of the land!

Seriously, tools are meant to solve problems, not to dictate what problems people should solve.

Vim and Emacs are quite well-suited for textual manipulation of very large data files. Use of a macro is a very pleasant alternative to piping the file through sed/awk/etc. for a once-off transform.
I never said it was a source code file.
Indeed, 640 KB ought to be enough for anybody. :/
This sed substitution thing seems to be a cute trend to express "fixed that for you". It goes nicely with programmer's propensity to express themselves in needlessly arcane ways.