Converting Markdown to ePub or Mobi Using Pandoc | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Converting Markdown to ePub or Mobi Using Pandoc (themythicalengineer.com)
	128 points by sks147 1927 days ago

15 comments

d_rc 1926 days ago

Thanks for the nice tutorial OP.

Here's a public notebook in Deepnote if anyone wants to play around with the code or duplicate it: https://deepnote.com/project/Converting-Markdown-to-Epub-or-...

2 fun facts about Deepnote:

1. You can create a Custom environment by writing a Dockerfile with all the libraries you need to install and everytime you're in a need to re-use a similar functionality (e.g. convert yet another book to mobi), you can just fire it up and all will be preinstalled. https://docs.deepnote.com/environment/custom-environments

2. You can turn any notebook to a blogpost right away and publish within Deepnote directly.

Disclaimer: I'm a software engineer at Deepnote.

OhHiMarkos 1926 days ago

Wait this is a notebook similar to pythons' notebook but it's a docker environment where I can install a lot of stuff I want and then do even more stuff? Am i getting this right?

It's like a shell to a vm but in a notebook format that you can then use to blog?

d_rc 1926 days ago

Yeah, you got it right. You can even access the actual shell in the vm, not just the notebook environment.

OhHiMarkos 1926 days ago

Awesome! I will definitely give it a try

sireat 1926 days ago

what would be advantages to going to Deepnote from regular Jupyter notebooks based workflow?

Let's assume someone who has been working with Jupyter notebooks(mostly Python based) for a long time.

Are Deepnote notebooks exportable?

The big worry is that you guys decide to pivot or radically change your pricing model and there is no offramp.

By comparison I don't mind using Google Colab. If Google Colab decides to shutdown or 100x their price I can take my .ipynb files and use them on my local littlest JupyterHub instance.

d_rc 1926 days ago

Deepnote internally supports .ipynb format and you can always export the Deepnote notebook to .ipynb similarly as you'd in Colab.

In general the main selling points are live collaboration (you can work on a notebook with you team as you'd do on a google doc), and integrations (you can plug-in your snowflake db, or s3 bucket or whatever, and have it connected for any further analysis, or a long-term training, etc.

For many non-software-developer data scientists, it's also easier to work in a cloud environment compared to installing stuff locally, and to version their notebooks in Deepnote instead of git. But this really depends on the particular workflow that one has.

sireat 1926 days ago

Thank you for the answers!

I can absolutely see a need for collaboration tool. Collaboration on regular Jupyter is a pain. I create a shared folder for coworkers and well read/write permissions* are not fun.

* knows chmod - https://www.reddit.com/r/linux/comments/dily0/i_know_how_to_...

sks147 1926 days ago

Thanks for publishing this as a notebook. I really love the platform.

ggambetta 1926 days ago

I followed a similar approach for my novel; started with Markdown, used pandoc to convert it to epub/mobi, but also to LibreOffice .odt to generate the PDF for the paperback. Wrote some details about the process here: https://gabrielgambetta.com/tgs-open-source.html

PascalPrecht 1926 days ago

That's how I wrote and self-published my book as well! Although, I created a script that turns md to epub/mobi/pdf using pandoc.

Here's how I did it in case anyone is interested: https://pascalprecht.github.io/posts/writing-an-ebook

sks147 1926 days ago

This is great. Thanks for sharing

kimi 1926 days ago

Don't want to be a troll, but if you are writing anything that is not a README and/or is a book or booklet or bookish, do yourself a favor and use Asciidoc instead.

debiandev 1926 days ago

This! Asciidoc is the grown-up brother of Markdown. Designed to scale up to entire books, handle images, tables, references, citation, book indexes, maths.

And the syntax is very friendly and intuitive.

kimi 1926 days ago

And it's quite easy to insert compiled images, e.g. Graphviz, UML diagrams, Ditaa, just by having the SOURCE in your document.

davegauer 1926 days ago

Absolutely. AsciiDoc (2002) is actually 2 years older than Markdown (2004), but is surprisingly similar to write.

It was created as an equivalent to DocBook XML for the creation of book-length technical documents. It has a rich history and is well supported in many places (try writing a README.adoc for your next GitHub-hosted repo).

It is also currently undergoing a standardization process:

https://projects.eclipse.org/proposals/asciidoc-language

dredmorbius 1926 days ago

What's your suggested Asciidoc toolchain?

(Pandoc will ingest Asciidoc as well as Markdown, FWIW.)

kimi 1926 days ago

At work, we have been using Asciisoc for 10 years or so, so we use Asciidoc -> XSLT -> PDF / chunked HTML. Works great, not nice to use.

For new projects, we use Asciidoctor - it does everything but chunked HTML, and it's a pleasure to use.

dredmorbius 1925 days ago

Thanks.

jtbayly 1926 days ago

Give me a reason to use asciidoc rather than bookdown.

gexla 1926 days ago

I assume the person you're replying to was referring to Markdown rather than Bookdown. It seems that Asciidoc was designed for the direction which Markdown has going with all these variants. If you find yourself chasing these Markdown variants to get more flexibility, then Asciidoc might be what you're looking for.

I don't know anything about Bookdown and it may be similar to Asciidoc. I would be willing to bet that Asciidoc would be around longer than most of the MD variants though.

Why might you want to continue using MD flavors? You already have loads of MD docs and you have no control over the processes which create them (shared environment, higher-ups force you to use MD, etc.)

NOTE: If you're interested in Asciidoc, also take a look at Asciidoctor.

jtbayly 1926 days ago

Thanks. That's basically what I figured.

One feature I wanted that made me go with bookdown is that the static website it creates has search builtin.

mraza007 1926 days ago

Pandoc has been one of the best tools I have used and this blogpost is well written

avinassh 1926 days ago

This is a nice tutorial, thanks for submitting it! However, for me, the biggest discovery was epub.press [0]. I just tried for couple of open pages, it works quite well!

[0] - https://epub.press/#about

progval 1926 days ago

Why wget|dpkg and wget|sh instead of apt to download Pandoc and Calibre?

You should be able to replace all this:

    !wget https://github.com/jgm/pandoc/releases/download/2.11.3.2/pandoc-2.11.3.2-1-amd64.deb
    !sudo dpkg -i pandoc-2.11.3.2-1-amd64.deb
    !apt install libgl1-mesa-glx -y
    !wget -q -O- https://download.calibre-ebook.com/linux-installer.sh | sudo sh /dev/stdin

With simply this:

    !apt install pandoc calibre

sundarurfriend 1926 days ago

Calibre website strongly recommends downloading from their site instead of OS packages, mentioning that the packages are often out of date. And I've generally found this to be true - Calibre versions on package repos are often several versions behind, more than the usual "package maintainer trying to play catch up" differences.

I'm usually averse to the wget|sh installs, but in this case it seems worth it. You can inspect the .sh file (which is really mostly Python code) before running it, just to not get into the bad habit of directly piping in code from the internet.

progval 1926 days ago

> You can inspect the .sh file

That's not the issue. Installing software this way means you have no automatic updates in the future. It's fine if you re-run the install script on a regular basis (eg. by recreating containers) AND you don't pin versions

But OP's instructions fail both: there is no mention of updates, and they pin the pandoc version.

stonesweep 1926 days ago

I use pandoc in a CD pipeline, the version in the repos is stale compared to upstream (normal, that's how it is) unless you're on a rolling distro like Arch.

I have reported pandoc bugs and had them fixed (great dev team), pulling the latest single-DEB install (no deps, unlike the one in the Debian repo) and using it gets all the latest updates which matter to a process like this.

In this particular case your needs to use the latest pandoc lead to the wget pull and install, which thanks to their DEB design is easy and clean to do in an ephemeral CI container.

coreypreston 1926 days ago

Do you have any more details about how you integrated Pandoc into your pipeline? A post or something?

stonesweep 1926 days ago

Sure thing, it's pretty simple and straightforward I can post right here. In your CI/CD runner, you add a "before" script like so (Gitlab YAML example):

    image: debian:latest

    before_script:
        - bash myscript.sh

Your myscript.sh can be as simple as four lines (one to install curl, it's not a default on Debian), example:

    apt-get -y install curl
    VERSION=$(curl -s "https://api.github.com/repos/jgm/pandoc/releases/latest" | grep -Po '"tag_name": "\K.*?(?=")')
    curl -sLo "pandoc-${VERSION}-1-amd64.deb" "https://github.com/jgm/pandoc/releases/download/${VERSION}/pandoc-${VERSION}-1-amd64.deb"
    apt-get -y install "./pandoc-${VERSION}-1-amd64.deb"

The Github API used above has the nice default of listing the latest release as you see used there in the grep on the right, one could enhance that with `jq` for higher intelligence but this very simple setup is functional as a starting point to develop your own style.

asicsp 1926 days ago

The tutorial is presented well. My biggest takeaway was that one can use 'Deepnote' to run Linux commands.

If you are interested in knowing how to customize `pandoc` for generating PDF/EPUB, I have a tutorial [0] based on books I've written. I also have links at the end with related resources, including tools others than `pandoc`.

[0] https://learnbyexample.github.io/customizing-pandoc/

throw14082020 1926 days ago

Jupyter notebook also has the same feature. `!shell command` or start a cell with `%%bash` and everything in it will run through the terminal, not the notebook interpreter.

asicsp 1926 days ago

That'd require you to have access to a *nix terminal on your system. Deepnote is allowing access through their servers, so for example, you can try this tutorial on Windows.

See this comment from Deepnote engineer for more details: https://news.ycombinator.com/item?id=26899905

Finnucane 1926 days ago

Note that mobi format is being deprecated by Amazon. If you're producing a file for distribution by Amazon, you only need the epub file.

Turing_Machine 1926 days ago

However, Kindles can't read epub directly. If you're trying to produce a file you can sideload on your (or your customer's) Kindle without going through Amazon, you're still gonna need mobi, or one of the later proprietary (and undocumented) Amazon mobi variants.

Finnucane 1926 days ago

True, but I wouldn't count on that being supported forever.

Turing_Machine 1926 days ago

The day Kindles stop supporting side-loaded third-party content is the day I stop buying Kindles.

alias_neo 1926 days ago

Is Pandoc being used mostly to join the files?

I recently started converting Markdown files to epub (and kepub) for my new Kobo. I load the Markdown straight into Calibre though.

On a side note, is there some benefit to mobi over epub? Kepub seems to be the preferred format on Kobo, because for some reason it turns pages _much_ faster than epub and gives access to reader statistics (if one cares about that).

pronoiac 1926 days ago

Hey, I'm curious about your Calibre usage! I'm working on turning a written book to Markdown, and pandoc has a real pain point on links between chapters. Please tell me more!

alias_neo 1925 days ago

What would you like to know? In Calibre you can set regex to indicate what should be used for chapter, sub chapter etc, and it can be used to generate the TOC. So I use Markdown headings, #, ##, ### etc for chapters and subsections.

pronoiac 1924 days ago

Checking, it looks like Calibre expects one Markdown file as input, where I have a few Markdown files, linking to each other in a way that works on GitHub. It looks like the sort of thing that either works as-is with luck, or is a pain in the neck and needs massaging.

alias_neo 1922 days ago

Ah, yes, I've only been doing this with single markdown pages. I believe people use pandoc for multi-page.

GitHub style linking doesn't really lend itself to "book" format so I suppose there's no auto way to do that.

hoppyhoppy2 1926 days ago

>On a side note, is there some benefit to mobi over epub?

AFAIK Amazon Kindle devices can read mobi files but not epub files (unless you convert them to something else first).

(The mobi format is older, so if you want to read an ebook on your old Palm Pilot PDA then you'll probably want mobi.)

alias_neo 1925 days ago

Thanks for the heads up! As for my palm pilot, well, I'm not sure which drawer that's been in for the last decade or two.

acidburnNSA 1927 days ago

Good stuff. When I wrote an eBook I found the extra features of reStructuredText to be useful (index, glossary, graphviz & Tikz environments, etc.) and wrote a sort of similar post.

https://digitalsuperpowers.com/blog/2019-02-16-publishing-eb...

masklinn 1926 days ago

Yeah the rSt markup is pretty idiosyncratic and both anal-retentive and a bit inconsistent (e.g. it can be hard to internalise where it does or does not want blank lines) but Sphinx is an actual document system, having to work with markdown for anything beyond a single file quickly gets painful.

throwanem 1926 days ago

You know, I bet it wouldn't take that much to go from epub to PDF, suitable for printing and binding. The structural information is pretty much all there, I think - it'd just need pagination and formatting for print, really.

I'd definitely want to use such a thing, as a way to feed my bookbinding hobby. I wonder if anyone else would?

timClicks 1927 days ago

Is the resulting typesetting any good?

maxnoe 1927 days ago

Epub is just html with css and images in a zip file. So the quality of the typography will largely depend on the renderer, not the file itself.

fau 1926 days ago

Unfortunately, the renderer that gets used will sometimes itself depend on the file. For example, if you transfer the generated Mobi file to a Kindle as the article says, it will get rendered using an inferior renderer (in terms of kerning, for example) compared to the renderer that would've been used had the file been a KFX ("Kindle Format 10").

dhruvdh 1927 days ago

The end result would largely depend on the quality of generated HTML and CSS, no? That's what the OP is most likely asking about.

masklinn 1926 days ago

The result is always an epub/mobi so the source doesn’t really matter.

5tefan 1926 days ago

I would like to have better type setting. It is a pleasure to have a nicely layouted and set page. My layman's take would be that the rendering engines have headroom to improve but most readers don't really care.

nvr219 1926 days ago

Pandoc is a GOAT tool! It's so good.

TTT1 1926 days ago

very interesting