Hacker News new | ask | show | jobs
Ask HN: Why did Microsoft not use HTML instead of .doc as Word doc format?
6 points by bluecat22 2807 days ago
If one were to write a word processor in 2018 from scratch, should they use HTML as the document format over .doc or anything else? Considering each browser is a free document viewer? In other words, are there any big technical differences which make .doc better than HTML as a document format or vice versa?
8 comments

Because when you design something, you design for different priorities, implicit or explicit.

.doc was supported / co-existed with .rtf and .txt at the beginning of Word.

Designing for interoperability, .txt and .rtf were good enough. Mac as the other major OS had an MS Office suite. Interoperability/importing WordPerfect was important.

Designing for backwards compatibility was a necessary for Microsoft supporting different OSs and legacy personal and business.

Designing for file size was very important. Most people that use Word don't use styles, so your HTML will be filled with inline CSS (and that didn't exist at the time), and filesize would definitely be impacted.

Designing for the page was important. Word was about printed / print-like documents. HTML struggles to do today what Word and other word processors have done for decades and continue to iterate on.

Designing for user experience really catapulted Word above competition. I used and liked WordPerfect but it was a blue screen DOS application. When I saw Word, I had WYSIWYG! And I had copy-paste. There was a time when copy-paste was a new thing for many users not in the *NIX/OS2/Amiga/Atari/BoOS world. (A year later I discovered Unix, and that's key also: design for discoverable features and platforms.)

So, what are you designing for? Figure that out, then pick your implementation method.

And the same question 14 days ago. https://news.ycombinator.com/item?id=18046382
One aspect where .doc beats .html is in the ability to rapidly write small changes to disk. Open a document, edit a few characters near the start, and save. .html would have to write the full document out to disk; .doc could write less than a kilobyte.

That was extremely useful in a time where programs crashed all the time, necessitating frequent saving, and floppy disks, with write speeds in the order of 50 kilobyte per second, were the main storage medium.

This feature lives on in Word as “fast saving”, but can be disabled.

The doc format predates HTML by a decade or thereabouts.
Is it technically superior/inferior to HTML as a document format for a word processor?
As long as you know how to access the data you store, it is mostly irrelevant what format it is in. Bytes are bytes, but less bytes means smaller files.
Microsoft did develop an HTML office format in Office 2000 for Word, Excel, and Powerpoint designed to be a replacement for the binary formats. It included extensive embedded XML. Users and companies still used the original. With Word html, people wanted all the Office specific code stripped out. Internet explorer was necessary if you wanted accurate rendering.

I was a developer in the Excel group in 2000.

The same now outdated HTML export is still is in Office today and mostly unchanged since Office 2003.
So there is no technical limitation of HTML itself compared to XML (doc format)? That HTML would be a just as good replacement for XML (.doc format) if someone tried to make a word processor for it?
The HTML document alternative was a massive technical hack. It was flawed mandate from executives in order to maintain relevance in web-based world. Embedded XML and custom styles were used to implemented the format. Also, the IE team worked closely with Office to support richer text formatting. HTML was not enough. Even then, the format still could not support all the features in the product--features like versioning, simultaneous editing, ole embedding, programmability, etc.
also what you are saying doesn't make any sense. word itself is the document editor. the format just stores the metadata and actual data. word takes that and turns it into the document you see. it almost seems like you are asking each doc file to be a standalone document and editor in one file.
I meant if someone were to create a word processor in 2018, with the least amount of work, shouldn't they use HTML as the format of the document? Assuming no need to backwards compatibility with any other doc formats.
For one thing, it was created as a proprietary format that predates HTML.
XML...