| I’ve spent a lot of time lately under the hood of ProseMirror, Lexical, and Slate, and I’m starting to feel like we’re all suffering from a collective "DOM-induced" Stockholm Syndrome. Most modern editors treat the document as a strict tree hierarchy. It’s the model we inherited from HTML, and it works fine for linear documents. But the moment you try to build something more complex—multi-column layouts, callouts, or AI-driven structural transforms—the model starts to feel incredibly brittle. The Friction of "Everything is a Tree"
When your state is a tree, layout is coupled to semantics. This leads to a few specific headaches: Structural Surgery: Moving a paragraph from a callout to a column isn’t a "move" operation; it’s a destructive deletion and re-insertion that necessitates re-calculating the hierarchy. The AI Mapping Problem: AI models don't naturally "think" in nested JSON block trees. Mapping a LLM’s intent back to a specific deep-nested path in a Slate schema is a recipe for state corruption. Layout Coupling: If I want to represent a two-column layout, the "columns" become parent nodes that own the content. If I want to change the layout, I have to restructure the content itself. An Alternative: IR-First, Region-Based Editing
I’ve been exploring a direction that decouples the Content from the Spatial Mapping. Instead of a tree being the source of truth, what if the document was an Intermediate Representation (IR) of flat, typed nodes, and the "Editor" was just a projection of those nodes into regions? The Concept: Content Pool: A flat-ish collection of nodes (Paragraph A, Image B, List C). Layout Regions: A separate schema that defines spatial "slots" (e.g., Header, LeftCol, RightCol). Spatial Mapping: Regions reference Content IDs. In this world, moving a paragraph between columns doesn't change the paragraph or its relationship to the document; it just updates a pointer in the layout map. Why this might be a terrible idea (or a great one)
I’m trying to sanity-check this before I fall too far down the rabbit hole. If we move away from the tree-as-truth model, I have a few nagging concerns: Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both? Are we forced to implement a completely custom virtualized selection model? The CRDT/OT Tax: Tree-based merging is well-understood (Automerge/Yjs). If the layout and content are decoupled, does syncing become an order of magnitude more complex? Ownership Invariants: In a tree, if a parent is deleted, the children go with it. In a relational/region model, you have to manage "orphaned" content. Is that a feature or a bug? Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"? I'd love to hear from anyone who has pushed past the standard block-tree model. Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free? |
If you take that as a solved problem, do your concerns change?
> Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both?
You could render them in the DOM as an old-fashioned tree, while internally manipulating your "flat" IR, to make selections work nicely.
This is not too different from how Yjs-ProseMirror works already: Yjs has its own representation of the state as a CRDT tree, which it converts to a separate ProseMirror tree on each update (& it uses a diff algorithm to map local user edits in the other direction).
> Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?
This might be how Dato CMS works? https://www.datocms.com/docs/content-modelling (I say this based off of 5 minutes spent watching someone else use it.)
> Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?
For lists specifically, I would argue the latter. It's natural to think of a list as a flat sequence of list items, in parallel to any surrounding paragraphs; forcing you to wrap your list items in a UL or OL is (to me) a browser quirk.
I made some progress fighting this in Tiptap: https://github.com/commoncurriculum/tiptap-extension-flat-li... Quill.js already models lists in this "flat" way.