Is the nested block/tree model the wrong foundation for modern rich text editors | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

Is the nested block/tree model the wrong foundation for modern rich text editors

2 points by Farooq1 138 days ago

I’ve spent a lot of time lately under the hood of ProseMirror, Lexical, and Slate, and I’m starting to feel like we’re all suffering from a collective "DOM-induced" Stockholm Syndrome.

Most modern editors treat the document as a strict tree hierarchy. It’s the model we inherited from HTML, and it works fine for linear documents. But the moment you try to build something more complex—multi-column layouts, callouts, or AI-driven structural transforms—the model starts to feel incredibly brittle.

The Friction of "Everything is a Tree" When your state is a tree, layout is coupled to semantics. This leads to a few specific headaches:

Structural Surgery: Moving a paragraph from a callout to a column isn’t a "move" operation; it’s a destructive deletion and re-insertion that necessitates re-calculating the hierarchy.

The AI Mapping Problem: AI models don't naturally "think" in nested JSON block trees. Mapping a LLM’s intent back to a specific deep-nested path in a Slate schema is a recipe for state corruption.

Layout Coupling: If I want to represent a two-column layout, the "columns" become parent nodes that own the content. If I want to change the layout, I have to restructure the content itself.

An Alternative: IR-First, Region-Based Editing I’ve been exploring a direction that decouples the Content from the Spatial Mapping. Instead of a tree being the source of truth, what if the document was an Intermediate Representation (IR) of flat, typed nodes, and the "Editor" was just a projection of those nodes into regions?

The Concept:

Content Pool: A flat-ish collection of nodes (Paragraph A, Image B, List C).

Layout Regions: A separate schema that defines spatial "slots" (e.g., Header, LeftCol, RightCol).

Spatial Mapping: Regions reference Content IDs.

In this world, moving a paragraph between columns doesn't change the paragraph or its relationship to the document; it just updates a pointer in the layout map.

Why this might be a terrible idea (or a great one) I’m trying to sanity-check this before I fall too far down the rabbit hole. If we move away from the tree-as-truth model, I have a few nagging concerns:

Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both? Are we forced to implement a completely custom virtualized selection model?

The CRDT/OT Tax: Tree-based merging is well-understood (Automerge/Yjs). If the layout and content are decoupled, does syncing become an order of magnitude more complex?

Ownership Invariants: In a tree, if a parent is deleted, the children go with it. In a relational/region model, you have to manage "orphaned" content. Is that a feature or a bug?

Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?

I'd love to hear from anyone who has pushed past the standard block-tree model. Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?

1 comments

mweidner 137 days ago

Managing "a flat-ish collection of nodes" that can be moved around (without merely deleting and re-inserting nodes) is tricky because of how paragraphs can be split and merged. Notion tackled this for their offline mode: https://www.youtube.com/watch?v=AKDcWRkbjYs

If you take that as a solved problem, do your concerns change?

> Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both?

You could render them in the DOM as an old-fashioned tree, while internally manipulating your "flat" IR, to make selections work nicely.

This is not too different from how Yjs-ProseMirror works already: Yjs has its own representation of the state as a CRDT tree, which it converts to a separate ProseMirror tree on each update (& it uses a diff algorithm to map local user edits in the other direction).

> Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?

This might be how Dato CMS works? https://www.datocms.com/docs/content-modelling (I say this based off of 5 minutes spent watching someone else use it.)

> Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?

For lists specifically, I would argue the latter. It's natural to think of a list as a flat sequence of list items, in parallel to any surrounding paragraphs; forcing you to wrap your list items in a UL or OL is (to me) a browser quirk.

I made some progress fighting this in Tiptap: https://github.com/commoncurriculum/tiptap-extension-flat-li... Quill.js already models lists in this "flat" way.

Farooq1 134 days ago

Your reply hits the real tension: a flat model simplifies layout changes, but it shifts complexity into how you map edits and selections. That trade‑off feels worth it if the goal is “safe structure changes” and AI‑driven transforms.

On the split/merge issue: in a flat model, the split/merge doesn’t have to be a structural operation at all. It can live entirely inside the block’s text content. The block keeps the same ID, and only its content changes. That avoids the “delete/reinsert” problem and keeps a stable identity for AI or history.

On selection: the cleanest route is to render a normal DOM tree for interaction and treat the flat IR as the truth. So the DOM is just a projection. That buys you native selection and IME behavior without building a custom cursor engine. The only hard part is deciding a consistent reading order (left‑to‑right, top‑to‑bottom, region order), so selection feels predictable even when layout is spatial.

On syncing/CRDT: a flat model can be simpler in a different way. You’re syncing text inside blocks plus lists of IDs in regions. That’s two clear problems instead of one giant nested tree. It doesn’t remove the complexity, but it makes it easier to reason about where conflicts live (content vs layout).

On lists: a flat list of items is closer to how people think. UL/OL is a browser artifact. Quill’s model already shows this is workable, and it makes the “content pool + layout map” idea more consistent.

Using TipTap/ProseMirror as the editing surface (selection, IME, rich text behavior) while keeping a separate IR is a reasonable split: the view stays tree‑shaped, the data stays flat.

So overall: this approach looks less like “throw away trees” and more like “trees become a rendering tool, not the canonical structure.” That’s a meaningful shift, especially if AI or layout transforms are first‑class.