Hacker News new | ask | show | jobs
by hamsterbase 481 days ago
When it comes to web archiving, I've found that Markdown has some real limitations. Sure, it's great for basic text, but it struggles with things like embedded content and non-standard layouts. Try archiving a Twitter thread or an app-style webpage in Markdown, and you'll see what I mean. It just doesn't capture the full picture.

That's why I've come to prefer formats like webarchive, mhtml, or single HTML files for archiving. They're incredibly faithful to the original content - you get almost perfect rendering of the original page, complete with styling and layout. Plus, they can capture stuff behind paywalls or on logged-in pages, which is a huge plus.

The real challenge, though, isn't just about saving the content. It's about making that saved content useful. These archive formats are great for preservation, but they can quickly become a mess of unorganized files that are hard to search through or make sense of.

I think the key is finding ways to organize and interact with these archives more effectively. Things like full-text search across all your saved pages, the ability to add notes or highlights directly on the archived content, and smart tagging systems could go a long way. And it'd be really powerful if we could integrate these archives with other knowledge management tools we use.

I develop a tool called HamsterBase that seems to address a lot of these issues we've been discussing. t's a local-first app. That means all your data stays on your own device - no need to worry about your personal archives being stored on someone else's servers. There's no sign-up or registration required, which is refreshing in today's cloud-centric world.

1 comments

> [Markdown] struggles with things like embedded content and non-standard layouts.

I don't share that experience. I typeset all these documents using Markdown with pandoc's div extension, transformed into XHTML, and then passed to ConTeXt:

* https://impacts.to/downloads/lowres/impacts.pdf

* https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

* https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...

From XHTML, the document is transformed into TeX statements, which opens a world of possibilities. In the following video, custom styling is applied to nested contents:

https://youtu.be/3QpX70O5S30?t=35

Those are all PDFs. Why, if Markdown is so great?
Not OP but I've done similar work myself.

Alternatives for authoring PDFs include LaTeX or similar markup languages, or GUI-based tools.

For many works, Markdown is more than sufficient for producing book-length texts (I've done this numerous times myself, either authoring my own works or transcribing/modifying books for improved access/readability). Markdown's benefit is that it is extraordinarily lightweight, and removes overhead from the authoring process.

Even where one ultimately chooses to migrate from Markdown to some more capable authoring format, Markdown remains useful for creating the original rough form of the work. Complex elements (figures, formulae, tables, etc.) can be indicated and, after document conversion from, say Markdown to LaTeX, fleshed out in full.

With tools such as Pandoc (see my earlier comments on it), it's trivially possible to create multiple outputs (I usually refer to these as "endpoints") of a document. I've used Makefiles to drive this process, such that I write source in Markdown and generate partial or full HTML documents,[1] other LWMLs,[2] PDF, ePub, straight ASCII/UTF-8/Unicode text, word-processing formats, etc., as I want. The set of Markdown + Pandoc makes this trivial in ways that, say, LaTeX alone isn't entirely suited.[3]

It's of course possible to use another LWML as the source format. Markdown has its limitations, but is most widely known and implemented, and limitations workarounds are typically reasonable.

________________________________

Notes:

1. A partial HTML doc may be useful for dropping into a larger document, and doesn't require global HTML elements such as the <html>, <head>, <body> tags, or others such as <nav> or <aside> in most cases.

2. Lightweight markup languages such as bbCode, AsciiDoc, RST, MediaWiki, OrgMode, etc., etc., see: <https://en.wikipedia.org/wiki/Lightweight_markup_language>. Useful when inserting the document into systems based on these formats.

I'm not arguing that we need more than Markdown; just the opposite.

The publication of all those as PDFs INSTEAD OF Markdown testifies to Markdown's big problem: the lack of readers (viewers) for it.

https://daringfireball.net/projects/markdown/syntax#philosop...

"Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions."

Any text editor (Notepad, TextPad, (neo)vi(m), Emacs, TextMate, Apostrophe, GhostWriter, Typora, etc.) will do. Markdown-specific editors have either a real-time preview or the ability to edit as WYSIWYM:

* https://keenwrite.com/ (mine, FOSS, cross-platform)

* https://pandao.github.io/editor.md/en.html

* https://markdownlivepreview.com/

* https://stackedit.io/

What do you mean by lack of readers?