Hacker News new | ask | show | jobs
by badsectoracula 1105 days ago
I think unicode, fonts, etc would be an issue even with a plain text editor anyway.

The "styling a range of text" is something i thought but you still need to somehow associate the text with the range - and vice versa - and this doesn't handle things like inserting images and other types of objects since these aren't text.

You could have a document be a series of "paragraphs", each being a series of "elements" with each "element" being something like "text" (with a style), "image", etc. But then once tables enter the picture, you need to expand paragraphs to be of "table" type and each table cell is itself a self-contained "series of paragraphs" - and then start thinking about nested tables or images in tables!

Generalize that enough to avoid special cases inside special cases and you end up with more of a tree-like structure representing a DOM and less with a linear structure with range-based styling.

(of course, then again, i don't remember Write for Windows 3.1 having tables in the first place :-P but i'm interested if there are alternative approaches anyway)

EDIT: one thing i forgot to mention - and why i am curious about non-DOM-based approaches - is that one problem with the DOM approach is the selection: with a linear/range-based structure the selection is just one or two indices inside the range, but with the DOM the selection can start from a node with node-specific subrange (e.g. character in a text node) and end with another node and both being very unrelated to each other (i.e. only having some distant common ancestor and not necessarily at the same level).

3 comments

I might have a way to simplify this?

A plaintext document is an array of chars, a richtext document is tree, which may or may not be well-formed.

Think about someone trying to bold semi-half of_a sentence_, and how MS Frontpage was made by smart people, it’s just really hard.

The most interesting thing lately is the HTML attribute `contenteditable`, and how it almost just kinda works! You still have to be full-stack to make something good, but that was an amazing improvement to the browser.

Yeah, that trying to bold half of a sentence - or even better, the middle of a sentence - is why i was wondering about simpler alternatives to DOM. Some time ago i toyed around with an HTML editor[0] (that one had to use a DOM anyway, but my question is for rich text editing in general - BTW the rectangles in the shot show a selection that goes across nodes) and doing something like that involved traversing all the nodes (going both down and up the node tree, starting from the cursor's starting position), finding the closest common ancestors under the selection, creating "B" siblings to them and then reparenting them under these new "B" nodes.

You can move a lot of that stuff to reusable methods but personally i find the whole "editing" aspect to be more involved than the "drawing" side - and also the one more likely to be different than a plain text editor - when dealing with DOM-like structures. Hence why i am interested to see what alternatives there are.

[0] https://i.imgur.com/jLlyNSS.png

Nice. I’ve searched far and wide, and most people still end up starting a whole company that only does a text editor.

Quill works but is basically dead since 2017. Almost anything foss is in a similarly ambiguous boat.

And then people like us, defeated, eventually buy something when we actually need it.

There are just so many ways that users try to use it. It’s a tough problem!

Text is usually stored as tree either way in an editor, using a DOM-like approach might work well on top of the usual datastructures.

> with the DOM the selection can start from a node with node-specific subrange (e.g. character in a text node) and end with another node and both being very unrelated to each other

I'd just store the range as character indices, using those the right nodes in the tree can be accessed pretty quickly as needed.

I don't think character indices are enough, what if your selection begins at the middle of a table cell and ends on an image that is the only child of a cell in a completely different table (no text involved at all, except some text in the cells in between)? If you want to, e.g., delete those how do you find which nodes are to be deleted and updated (e.g. for merging the two tables if there are cells after the one that contains the image)?
Images and table cells are just nodes within the tree holding the text, assuming all styling etc is represented in a plain text syntax similar to Markdown, of course. Looking up the nodes from char indices is quick if each node stores how many chars it contains.

Other approaches would probably require the selection to be a tree of its own, I can't really say whether that's simpler overall or not.

The syntax shouldn't matter (you may not even being using a plain text syntax - or any syntax - anyway), you could treat an image or whatever as a single "special" character. Or just assign a linearly increasing ID (increasing in the order the text, images, etc flows) to each node.

Though that is basically another way to represent what i wrote above with having a pair of node pointers and a subrange (well, an index actually, the other end of the subrange is implicit if the node pointers are different). This is basically what the old HTML editing control Microsoft had back in the 90s used and that worked with the DOM tree (also what i used in a test editor i wrote some time ago). And yeah it isn't simple.

Start and end of a selection would be nodes in the tree. The algorithm to find all the selected nodes is already part of the renderer, as the renderer has to implement traversal of nodes in text order as well.