| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by endtime 5918 days ago
	.docx isn't hard to decode. ;) Though it is hard to render across browsers.

1 comments

blasdel 5918 days ago

OOXML is no easier to decode than the old memory-dump and COM-based DOC formats which were comprehensively documented first by third parties and then Microsoft. It's just like how SWF was trivial to decode long before it was 'opened'. Container formats are fucking easy.

What's always been insanely difficult is duplicating the API that the content interfaces with, bug for bug.

link

nailer 5918 days ago

As someone who writes an Open Source OpenXML decoder I find it way easier than COM. The OpenXML specification docs are comprehensive, when there's a problem - usually detected by OpenOffice's parser as Word is very forgiving - the specs have a specific answer.

There's good development tools including a Firebox style Xpath app for Word too.

The main issue is MS Office 2007 and 2010 generating the legacy OpenXML formats by default, which include a world of possible features including quite a few from Lotus 123. There's too many edge cases to handle, and this isn't the right format for docs made this year.

link

blasdel 5918 days ago

'decode' was probably the wrong word to use there — how about 'consume'?

link