OOXML is no easier to decode than the old memory-dump and COM-based DOC formats which were comprehensively documented first by third parties and then Microsoft. It's just like how SWF was trivial to decode long before it was 'opened'. Container formats are fucking easy.
What's always been insanely difficult is duplicating the API that the content interfaces with, bug for bug.
As someone who writes an Open Source OpenXML decoder I find it way easier than COM. The OpenXML specification docs are comprehensive, when there's a problem - usually detected by OpenOffice's parser as Word is very forgiving - the specs have a specific answer.
There's good development tools including a Firebox style Xpath app for Word too.
The main issue is MS Office 2007 and 2010 generating the legacy OpenXML formats by default, which include a world of possible features including quite a few from Lotus 123. There's too many edge cases to handle, and this isn't the right format for docs made this year.
What's always been insanely difficult is duplicating the API that the content interfaces with, bug for bug.