Hacker News new | ask | show | jobs
by apenguin 4260 days ago
I love the idea of Literate Programming, and moreover pandoc is one of my absolute favorite tools. As such, I find this very interesting.

However, I take issue with your complaint about Emacs being so huge -- pandoc is right up there, too (134 vs 89MiB on my system). Not to mention its seemingly endless stream of dependencies (50 packages according to my manager), as well as GHC which is over 700MB on its own. If you work with Haskell, this might not be too big of a deal, but otherwise you might need all this for pandoc alone. This is actually an issue for me with my tiny laptop SSD (this ends up consuming more than 5% of my root partition) -- I'm always debating removing pandoc, but never do because it's just such a great tool.

2 comments

Pandoc doesn't seem as bad as you suggest. GHC, dependencies, etc. are only needed for compiling; according to http://johnmacfarlane.net/pandoc/installing.html Pandoc can be compiled into a standalone binary. The Windows build on https://github.com/jgm/pandoc/releases is 17.1MB and the Debian package in Wheezy is 18.9MB with reasonable-looking dependencies.

The scripts I've written (PanPipe and PanHandle) require a Haskell implementation and the Pandoc library in order to be compiled or interpreted. Once they're compiled with GHC, they're completely standalone.

My plan is to have my server recompile my site when changes are pushed to Git. I like having Emacs, GHC, etc. on my laptop, but not on my server.

I actually tried to integrate Babel with Hakyll originally, but hit a bunch of problems. I didn't include that in the page since I thought it would be distracting. Most of it boils down to Org-mode's HTML exporter being awkward to invoke as part of a UNIX pipeline:

- Emacs can't handle stdio

- Org-mode has breaking changes between the version bundled in the latest stable Emacs (24.3) and that in ELPA (which I use)

- Syntax highlighting depends on the current Emacs theme

- Whole HTML pages are generated, which makes templating harder

- Anything which uses the filesystem leaves artifacts around

I did manage to hack together a shell script which created and switched to a temporary directory, saved /dev/stdin to a file, opened Emacs in batch mode, loaded Org-mode, opened the temp file, tangled the file, exported the file, exited Emacs, ran the result through some XSLT transformations and Python scripts to extract the data needed for templating, spliced the results through the templates, spat the result to /dev/stdout, switched out of the temp directory then deleted it. Needless to say, it was very fragile, and much more complex than writing these Pandoc filters!

Regarding the Windows build: I just installed it, post-install size is 69.1MB. I feel like a factor of two means my point about emacs not being that big still stands :)

Static builds also pose somewhat of a problem when it means I have to rebuild all the dependencies for every update... I've run into this on Arch when clearing out makedeps for hard drive space (those 50 packages probably aren't all hard deps but I don't want to go against the will of my package manager). I know this is a solvable issue, I just wish it was easier.

Also, I recognize the issues it poses here, but syntax highlighting inheriting from font-lock is one of my favorite things about the HTML exporter.

EDIT: Accidentally duplicated a predicate by duplicating a predicate.

EDIT2: I'll just expand my response

The whole-page templating thing is a problem I've been trying to work around myself, but I've had too much fun thinking about it to actually get started on anything. At some point writing another HTML exporter feels kinda mundane and I get the idea that I need to work on a ConTeXt exporter since it hasn't been done before.

I like Markdown's syntax more than that of org-mode, but I don't like the lack of standardization. I kinda wish all the (popular) flavors were a subset of Pandoc Markdown so as to keep compatability... But that's never going to happen.

Is there a way to create a self-contained Pandoc? Surely it doesn't use the entirety of GHC. I've seen this done with some Python programs which are distributed with a stripped down Python interpreter and stdlib.