Hacker News new | ask | show | jobs
by n0rlant1s 1250 days ago
Thanks for responding. I'm curious why PDF doesn't have any metadata attached to it that can easily be parsed out by machines. Sigh
2 comments

Thanks for sharing! Why do you think XMP isn't widely adopted yet?
There has been a lot of politics. It's yet another case study for "why we can't have nice things."

When XMP first came out, Adobe tools would look at all the metadata in, say, an image file (such as EXIF) and re-express it in XMP format. I liked that a lot because I could read that XMP packet with my RDF tools and have complete access to all the metadata with very simple software.

At some point other people in the industry accused Adobe of undermining other metadata standards and Adobe was pressured to only use XMP for data that could not be expressed with EXIF and other formats. This takes away complete and easy-to-work-with metadata unless I write my own tools that can convert the EXIF metadata to XMP and merge it with the XMP which might be in the document.

The semantic web community also has some blame here as it never embraced XMP, if Adobe had had more industry support it might not have nerfed XMP. I very much like how XMP adopted solutions to problems like keeping track of the order of authors that communities like the one behind Dublin Core haven't had the moral fortitude to address... Keeping Dublin Core in the category of "metadata for an elementary school library" as opposed to the world beating solution that XMP and DC could have been.

You might like this thesis:

http://www.bloechle.ch/jean-luc/pub/Bloechle_Thesis.pdf

I made a HN post on this here: https://news.ycombinator.com/item?id=33674525

Unfortunately I contacted the author via youtube and the work is proprietary, owned by the business he either created or sold-to.

Thanks for sharing -- will dive deeper. This has been keeping me up at night recently...