Hacker News new | ask | show | jobs
by daveydave 1615 days ago
I have an application that converts word documents to RDF conformant with the SPAR ontologies (mainly DoCO http://www.sparontologies.net/ontologies/doco), so it contains things like headers, numbering, contains/within relationships explicit in the RDF. I've used it successfully with PDFs by converting to DOCX first. Is this the sort of thing you had in mind? Not here to sell it! I think this is a genuinely interesting unexplored area ..
2 comments

The PDF format supports attachments (embedded files). I'm thinking about a set of libraries and/or a command-line utility that would make it trivially easy to attach a SQLite|JSON file to a PDF or extract one from a PDF. This won't fix existing files, of course, but at least for those apps that generate PDFs it will be easier to embed a SQLite/JSON into a generated PDF.
XMP is meant for adding semantic information to (parts of) PDF files.

https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

This looks awesome! The decision to combine structural and rhetorical ontologies, seems like it optimizes the best between cost and availability, in the sweetSpot of the users actual requirements when working with research and academic documents.

Is this compatible with User Defined Language?

https://ivan-radic.github.io/udl-documentation/

The RDF output I would typically serialise as turtle, which I believe there is existing UDL for in notepad++ though I don't use it