Hacker News new | ask | show | jobs
Show HN: Obsidiantools – Python package for analysing Obsidian.md vaults (github.com)
7 points by mfarragher 1736 days ago
3 comments

Hi all,

I've made a new Python package - obsidiantools - for getting structured metadata about your Obsidian.md notes and analysing your vault. Complement your Obsidian workflows by getting metrics and detail about all your notes in one place through the widely-used Python data stack.

Check out the Github page for more detail about the API and its usage. There's a link to the 'obsidiantools in 10 minutes' demo, which is set up in a virtual machine through Binder, so you can interact with the code for the demo there!

Very little code is needed to analyse a vault. What I find really great is the ability to do sophisticated analytics through the integration with NetworkX graphs. In my demo I've applied the PageRank algorithm to the vault - analysis of backlink quality is just one example of how graph analytics can complement your knowledge management workflows, especially on larger vaults.

I go into more detail on future development, screenshots and more in this Obsidian community topic: https://forum.obsidian.md/t/obsidiantools-python-package-for...

In practice this should work for Zettelkasten vault formats - the API has worked OK for one of my vaults where I had dates/years with hyphens and I think it would handle the default 12-digit note names.

Wow I hope this grows. A robust Obsidian database model for Python is what I need. I was even going to develop it myself as time would permit.

Please also encapsulate R/W access to files' FrontMatter and file system properties (like files' creation and modification time).

I was surprised while I was developing this package how little code was needed in the end to write the functions, once the markdown data is extracted from files. md -> HTML -> ASCII was the hairy part but the results from html2text look great. The html2text config options aren't well-documented - there was a lot of trial-and-error.

File system properties do seem like an intuitive next step to extend the metadata 'getter', as the pathlib.Path objects are easily accessible. I was aiming to bring timestamps in but realised that the timestamps aren't available consistently in Linux, so I'll need to think about a cross-platform design.

FrontMatter is a good shout, I haven't used any in my notes, hopefully that content can be parsed neatly.

> markdown data is extracted from files. md -> HTML -> ASCII was the hairy

Out of pure perfectionism (in avoiding unnecessary complexity and dependencies) I hope we can omit the HTML part one day. It probably isn't going to be hard to just parse MarkDown directly. Sure, MarkDown is supposed to allow inclusion of HTML tags but I don't use any except in some rare cases like when I need a page break for pretty export.

> I was aiming to bring timestamps in but realised that the timestamps aren't available consistently in Linux

This can be among motivations for putting them in the front matter. Another being the fact the file system timestamps can change when you perform operations which don't actually change their content. At this moment I'm looking forward for a script (I was going to write it myself as time would permit) which would scan all the files and add/update front matter to include (if it doesn't) creation time (in ISO UTC format like YYYY-MM-DDTHH:mm:ss.ffffffZ) initialized as the fs's file creation or modification time, whichever is earlier.

> FrontMatter is a good shout, I haven't used any in my notes, hopefully that content can be parsed neatly.

I even hope Obsidian authors are going to add support for tags (as just the names of tags and links, I don't think existence of tags and links with same names is a useful case to support) listed in the front matter as `Tags: tag_a, tag_b, Link C` once. I am going to use this even if they don't although it's sad to see Obsidian doesn't recognize them there.

Looks very good, excited to dig in