Hacker News new | ask | show | jobs
by punnerud 1571 days ago
Not from document to block, but from XML-based into database-based.

Try to open a Word document with a zip program, all you will see is a lot of folders with XML and blob images.

Latex and Word is XML. Notion is database.

The benefit of database: History, scale better, multiple users, merge text as diff is simpler +++

2 comments

Surely once you've got a block inside a block you're back to the XML model again?
XML is a document. A relational database is a relational database. Both can be used to create a tree structure. Notion does it wit a "block" table, each block having a parent block id, and a list of child block ids, allowing tree traversal in both directions.
Once you're into a relational model you can start treating your forest of trees as a big graph if you want to (though you don't have to). And you can edit nodes individually without having to iterate the entire document.

But assuming you're trying to maintain the tree structure you still have many of the same issues. Each node will need to entail the context of its parent, which means that you'll need to know things like transitive closures in order to know if a parent node affects a child (e.g. deletion) or if a child affects a parent (e.g. re-render tree). Or if you move a node do you have to re-create pointers below it? And tracking history could get complicated because it might span both the content of the node and the tree structure metadata (e.g. can you undo a change where the text was bold and a block was moved around). Where do you put transactions?

I'm not saying this is the same as XML, just that you can't magically escape all of the downsides. It's a fun problem to solve!

XMl is format, not a document. XML can be used to express whatever data structure you want. For the user it has little meaning whether the backend is using xml, json, a sql- or nosql-database. The interface and workflows are hiding it all away.
How is Latex XML?
ismorphic to xml. it's markup. not structured data
Pretty sure database is also isomorphic to XML, in that sense. I agree that Notion-ish documents are more structured than Word-ish, though.
A SQL database, with indexing configured correctly, allows you to look up a row in O(log(n)).

A bag of XML bytes doesn't give you that. It takes, at best, a SAX parser to do an O(n) scan through the whole document to find stuff. Most DOM implementations give you O(1) indexing by ID, but they require you to parse it first, and that's going to take O(n).

Creating a database is >= O(n).

While creating and editing a database, it is SOP to create/maintain and save data structures that provide fast access later.

Is there some reason why you couldn't do the same for XML?

The problem isn't creating the XML file. The problem is querying it later, after you've dumped it from RAM to disk, you have to load the entire thing off disk back into RAM in order to rebuild the DOM.

A database like SQLite allows you to perform structured queries at faster-than-O(n) speed straight off the disk.

TeX is a Turing complete programming language. It’s nothing but data and calls to subprograms.
Well with GP's logic, a C program is isomorphic to XML because it can be parsed and then the parse tree serialized as XML.