| Your reply hits the real tension: a flat model simplifies layout changes, but it shifts complexity into how you map edits and selections. That trade‑off feels worth it if the goal is “safe structure changes” and AI‑driven transforms. On the split/merge issue: in a flat model, the split/merge doesn’t have to be a structural operation at all. It can live entirely inside the block’s text content. The block keeps the same ID, and only its content changes. That avoids the “delete/reinsert” problem and keeps a stable identity for AI or history. On selection: the cleanest route is to render a normal DOM tree for interaction and treat the flat IR as the truth. So the DOM is just a projection. That buys you native selection and IME behavior without building a custom cursor engine. The only hard part is deciding a consistent reading order (left‑to‑right, top‑to‑bottom, region order), so selection feels predictable even when layout is spatial. On syncing/CRDT: a flat model can be simpler in a different way. You’re syncing text inside blocks plus lists of IDs in regions. That’s two clear problems instead of one giant nested tree. It doesn’t remove the complexity, but it makes it easier to reason about where conflicts live (content vs layout). On lists: a flat list of items is closer to how people think. UL/OL is a browser artifact. Quill’s model already shows this is workable, and it makes the “content pool + layout map” idea more consistent. Using TipTap/ProseMirror as the editing surface (selection, IME, rich text behavior) while keeping a separate IR is a reasonable split: the view stays tree‑shaped, the data stays flat. So overall: this approach looks less like “throw away trees” and more like “trees become a rendering tool, not the canonical structure.” That’s a meaningful shift, especially if AI or layout transforms are first‑class. |