Hacker News new | ask | show | jobs
by grumpy-cowboy 2347 days ago
Text files are king! I store every single byte I can in text files. Examples:

  - Tabular data     : TSV   (almost all Un*x/GNU tools handle this out of the box)
  - Simple "records" : GNU Recutils format (https://www.gnu.org/software/recutils/)
  - Formatted texts  : Markdown, LaTeX, ...
  - ...
If I need some hierarchical kind of information, I use a folder structure to handle this.

I know that not everything can be stored as text. But I try to use open, well documented and future proof formats. Examples:

  - Images : PNG
  - Music  : FLAC, Ogg, ...
  - If I really need to preserve orignal format/design of a web page: PDF
Nothing's perfect but stay away from any closed/obscure/proprietary formats.
10 comments

The PNG spec is an interesting read. There are some overly fancy things in there and some dead-practical ones, like the magic number at the front of the file contains an 8 bit character ("8 bit clean" was still a phrase you could utter back then, today it's assumed), the section headers are plain text (you can run 'strings' on it), and there's even a bit that determines whether the section is mandatory to render the file properly. You can add arbitrary metadata and any other reader can still display the image.
Agreed. I don't know for sure what operating system or device I'll be using in 20 years, but I know it'll be able to read and edit text files. I once used MS OneNote and it's great, but once you leave Windows you basically have to throw it all away, and so in the long run I just wasn't comfortable raising the cost of switching more with every note I created.

And of course interacting with those files using the vast ecosystem of countless simple commandline tools and using the same efficient text editor to edit almost all of my documents makes the whole thing a much better experience than all alternatives - at least when text is viable; imho diagrams etc. are still too cumbersome compared to a quick free-hand sketch.

Hmm now that you mention it, does anyone know of a utility that is able to take an image of a hand drawn diagram and convert it to something like dot or C4?
>but I know it'll be able to read and edit text files.

Unless Apple and Microsoft decide that the ability to view and manage files directly is too confusing and dangerous for users and removes your ability to have portable raw text files.

And then all the HN users rejoice about how much simpler life has become when they let tech companies make choices for them and how files were never that good anyway.

Many people need to interact with files directly due to their job(whether it's development, system administration, video editing, ...), and neither MS nor Apple will ignore that market.
The irony here is you both talking about "Microsoft and Apple" as if the famous difference in their text file structures, newline conventions, was not there, and that we lived in an alternative reality where there were "portable raw text files" enabled by Microsoft and Apple.
> - Simple "records" : GNU Recutils format (https://www.gnu.org/software/recutils/)

Wow thanks for this recommendation. I've got a few things lying around that I've been using awk/bash for and where even sqlite is overkill but it looks like this solves the same issues in a much better and more concise way. I might try converting these this afternoon. Can't wait to give the csv conversion a try too.

One format that doesn't seem to get much attention nowadays is RTF (https://en.wikipedia.org/wiki/Rich_Text_Format), but there's no reason it shouldn't be readable for years to come.
Yes, these things can work. I think Plain TeX would be more likely to work better in fifty years than LaTeX; I use Plain TeX myself. I think TSV is also good.

I think PDF is complicated, though. (However, there are simpler subsets defined which omit some complicated stuff.) (If you really need to store the contents of a page, PNG might do.)

The SQLite version 3 database format is also unlikely to change I think and it is documented. (SQLite is also in the public domain, which also helps. You can avoid WAL and that other stuff if you want to ensure working in future, I suppose.) (If it does change a lot, probably it won't be called version 3, any more, I think.)

I was using LaTeX in college back in the 90s; it was already ten years old by then. And since LaTeX is a macro package for TeX, it will run anywhere TeX can.

I agree, SQLite will live forever, thanks to its public domain status.

I do not think PDF will prove to be future stable.
Yes, it is probably true. There are less complicated subsets, but using PNG to save the picture of the page might help better (although PNG doesn't used with CMYK or with extra separations). There is also DVI, which TeX uses as output, and it is simple so a program can easily be written to rasterize it or to convert to whatever other format the printer uses.
The thing that people forget about "almost all Unix/GNU tools" is that there is not just one character-separated variable-length-record flat file text table format. There are at least three. And that's just on the Unices and Linux, and not counting ASCII.

* http://jdebp.uk./Softwares/nosh/guide/commands/console-flat-...

Another great console tool to view/manipulate/process/... tabular data is Visidata (http://visidata.org/).
Do you know of any good plaintext formats for calendars? Also, I'm pretty sure DjVu is a more open and well-documented format than PDF.
Agree to text files. I would throw in Yaml to the mix. I use it preferably for all data files meanwhile that one would otherwise use json for. It‘s slower to parse than json, but it‘s incredibly simple and human readable (and writable), and will stand the test of time.
Nah. Yaml is a total mess with far too many special cases (time handling, many different ways to write booleans). I think it doesn't have any kind of formal spec? It makes sense where you want to optimize for human editing in the short term, but it is by no means a format for the long term.
The spec is here, https://yaml.org/spec/1.2/spec.html

> by no means a format for the long term.

I would take that bet :)

How do you feel about things like systemd’s journal binary logging system?

Is this an acceptable case of non-text format? And if it is, what makes it different?

Does anyone need to keep system logs for fifty years?

Practically speaking, I'm fine with a constraint where the systemd database format + tools need to be roughly kept in sync. I can't think of a realistic example where this wouldn't be the case. Most of the logs in this database are supposed to be ephemeral.

If you're in banking or medicine or something and are required to keep certain logs for a decade+, you should figure out what you actually want to keep and put it in a format that would be reasonable to access on that kind of timeline.

> Practically speaking, I'm fine with a constraint where the systemd database format + tools need to be roughly kept in sync. I can't think of a realistic example where this wouldn't be the case.

That can only happen in practice if you take the logger and tools from the same group of developers. Yet another case of forced lock-in from Systemd.

We keep logs for either 3 or 7 years, depending. journald is a waste of electrons for us, and that's one (of several) reasons why.
Btw there is Journal Export Format[1], which one can use to archive or process selected journal data in a simple plain text form.

[1] https://www.freedesktop.org/wiki/Software/systemd/export

IMHO binary logging is case of premature optimization.

The premise is that somehow binary logs buys you something. Either more precise data or faster access or something else.

Truth is, it could all have been done in ascii and it would have been more portable, accessible and resilient to failure.

No idea what systemd's implementation is meant to accomplish, but, in general, on a memory-constrained system, a binary log can theoretically speed up system performance by taking up less memory, which leads to more available pages for other purposes.
But would you even run a systemd-based Linux distribution on a memory-constrained system?
Does anyone actually look at the systemd log? I ignore it. Rsyslogd handles all of our logging needs in a sensible textual format that does not require special commands to view or useless make-work to pipe into a database.

To answer your question, I thought and still think it was a bad choice. The only time I would ever be interested in the contents are during early-boot failures, exactly the time when the toolset is limited and most folks aren't familiar with what's available - exactly when simple text is easiest to work with without finding another machine to stare at the `journalctl` man page.

The rationalization about detecting record corruption makes very little sense to me. (Now, there is a valid concern about potential log forgery, enabled by poorly written apps that directly log user input without sanitation. But that's better mitigated in the buggy app, which almost certainly is doing other unsafe things with user input. And if that were actually the concern, they had other choices that would have been far less annoying.)

One could better ask how people feel about all of the binary databases that exist in Unix, from the Berkeley DB databases used for the likes of termcap and the system account database on the BSDs, to the binary login database that has been around since the 1970s.

* http://jdebp.uk./FGA/unix-login-database.html

How do you work with text files on a mobile device? What do you use to synchronize the files?
Not parent commenter, just sharing my solution. I use Zim[0] because it already saves a folder hierarchy of markdown TXT files. The base folder is synced to Dropbox, and the markdown is readable/editable enough if you open it on a mobile device.

I'd love to use CherryTree[1] because it supports encryption and is more functional, but it stores everything in a single XML/SQLite file. Neither Zim or CherryTree[1] have mobile apps.

I really tried to use Joplin[2], which saves markdown too and has mobile apps, but the desktop app is huge. I prefer to use those resources for Keybase.

0. https://zim-wiki.org

1. https://www.giuspen.com/cherrytree

2. https://joplinapp.org

The mobile apps are where I'm lacking at the moment - I've reluctantly started using dropbox as the syncing solution, but there's not a lot of great mobile apps that work with text well.