| I read a lot of research papers for work. My workflow evolved around an ever-growing inbox of bookmarked papers from arXiv et al. Great for exploration, but hard to keep track of what I read. Distillate bridges the tools I already use: Zotero (literature management), reMarkable (reader + highlighter), and Obsidian (notes). It automates the whole pipeline: $ distillate save to Zotero ──> auto-syncs to reMarkable │
read & highlight on tablet
just move to Read/ when done
│
V
auto-saves notes + highlights
It polls Zotero for new papers, uploads PDFs to the reMarkable via rmapi, then watches for papers you've finished reading in your Read folder. When it finds one, it:- Parses .rm files using rmscene to extract highlighted text (GlyphRange items) - Searches for that text in the original PDF using PyMuPDF and adds highlight annotations - Enriches metadata from Semantic Scholar (publication date, venue, citations) - Creates a structured markdown note with metadata, highlights grouped by page, and the annotated PDF (I keep mine in an Obsidian vault) The core workflow just needs Zotero and a reMarkable — no paid APIs, no cloud backend, your notes stay on your machine. Optional extras if you plug them in: - AI summaries via Claude (one-liner + key learnings from your highlights) - Daily reading suggestions from your queue - Weekly email digest via Resend - Obsidian Bases database for tracking your reading Stack: rmapi for reMarkable Cloud, rmscene for .rm parsing, PyMuPDF for PDF annotation. Python 3.10+, pip installable. The trickiest part was highlight extraction: reMarkable stores highlighted text as GlyphRange items in a scene tree, and matching that text back to positions in the original PDF required fuzzy search with OCR cleanup, plus special merging logic for e.g. cross-page highlights. Happy to say it works well ~99% of the time now. Install: pip install distillate && distillate --init Code: https://github.com/rlacombe/distillate Site: https://distillate.dev I built this for myself but would love feedback, especially from other reMarkable + Zotero users. What's missing from your workflow? What else should I add? |
As you mentioned Remarkable file format is a PITA to extract highlights, one thing that helped a lot was to add an OCR fix phase that uses Gemini flash model to fix common OCR errors and to merge single highlights that are across pages.