Hacker News new | ask | show | jobs
by egeozcan 1465 days ago
I'm quite disappointed that even in the year 2022 the best solution HTML offers for rich text editing is the infamous contenteditable attribute.

I can never expect a WYSIWYG editor from a library to work 100% without friction on every platform.

I know it's already near-impossible to implement a web-browser and adding more to the pile of standards/components to implement sounds counter-intuitive. On the other hand, a basic browser that can render just basic HTML and basic CSS is comparatively easy and I would rather have them be able to render a rich-text-field on a low-capability device than being able to do all sorts of JS/CSS magic.

I'm not even against React and all the amazing ecosystem surrounding it - given that your use-case justifies it. I just think auto-completers and rich-text editing and other stuff that nearly every website on this planet re-implements should be standard.

5 comments

What we need is the web standards groups to come together and add a rich text input element as a native browser control. Implementing a rich text editing component in javascript is insanely difficult because you need to support:

- Every platform (windows/macos/ios/android/linux, firefox/safari/chrome. There are all sorts of platform specific details - like windows using \r\n instead of \n. Your rich text editing component needs to feel native everywhere - even though that means something different on every platform.

- Undo / redo support

- Support for international inputs - to support Korean, Japanese and Chinese characters.

- Detailed canonicalized editing events for integration with collaborative editing libraries

Years ago I played around with the rich text editing components in Cocoa (MacOS) and coming from the web, its an utter delight. That code is all there on your machine - its just not exposed through the web browser. So we're stuck with janky half working "rich text" editors in reddit, slack and everywhere else online.

Decades of halfbaked attempts has shown that custom javascript isn't good enough to solve this.

Its is a web standards problem. Rich text editing should be built in to the web browser.

well, beforeInputEvent.preventDefault() got close to making the problems in contentEditable tractable... but when it comes to CJK it just shrugs and says "eh, good luck with composition".

On the subject of Cocoa, I can't even find the docs anymore for the low-level text shaping layers of the system. I've looked a couple of times because I want to re-implement some parts of those APIs. Everything seemed so well-designed, but now it's lost to the sands of time as Apple archives documents and Google search deteriorates. Alas.

EDIT: blessings! I found some of the docs for Cocoa's text layout system. Just look at this. If only we had this for the web, as a standard...

https://developer.apple.com/library/archive/documentation/Co...

EDIT 2: Oh, Apple has a new text layout engine "TextKit 2". I haven't found any of the really nice technical illustrations or long-form guides for it. Instead, I'm watching this video: https://developer.apple.com/videos/play/wwdc2021/10061/

This is all true plus other things that make it difficult to implement a rich text editor are accessibility (keyboard shortcuts!), dark mode (theming) and customizability.

Having a rich text editor in the web standards would be nice for prototyping. However all the form elements (select, date picker, progress bar etc.) are also web standards but no one uses them because they are ugly and not customizable.

So the chances that we will get a standard rich text editor on the web that web devs would actually use are slim.

Yeah good catch - OS-native keyboard shortcuts are really important & hard to reimplement well!

Nobody uses the built in date picker because it’s ugly, and making your own date picker is relatively easy. (Well, at least on desktop). Rich text editing is the inverse - it’s basically impossible to make your own. But theming a built in rich text editor should be really easy.

All the browser should implement is a rich text area itself. Leave it up to the web developers to add our own buttons for bold / italics / etc - since people will want to style that stuff themselves anyway. And add a clean, simple event API for “onchange” or something which gets called before any change is applied to the text area - initiated by a user or by the system. The event should tells you exactly what the change is (including styling) and let you tweak the change before it gets applied to the input element.

I think web devs would go bananas for something like that if it was designed well. I’d use it for sure.

You risk ending up with something like the date component which is so ugly nobody uses it.
clients should only ever input plain text

it's the application that should decide how it's to be displayed in different contexts

you're fighting the web if you're trying to do it otherwise, but then again, everybody is nowadays...

The web has HTML and parsing/sanitizing it is a solved problem. I fail to see the need to go back to plain text not to be "fighting the web".

The world wide web is a free platform where you can publish plain text or HTML, but denying that it also became an application delivery platform is fighting against the reality.

I'm an early-internet kid, and even if we are going back to discover the "spirit of the internet", those times are full of contenteditable hacks and custom applet HTML editors.

Being able to edit HTML visually is so common, be it a simple personal homepage, editor to a corporate blog, or a product management tool.

Much of the complexity of contentEditable comes from using the same DOM nodes both as an output device to render a semantic data model, AND as an input device, to interpret input-related events into changes to the semantic data model. If you can separate the two concerns, it would be much easier to eg accept CJK text and keep output looking correct, without breaking input.
100% this - but good luck fixing that, since the people who'd have expected contenteditable to work are the ones that want to unify input and output... so it's an intractable mess

you basically want an UI for inputing and validating some schema-enforced JSON, but that's overwhelming and overkill for most users and more apps, so you script into existence a simplified version of that (aking to "everyone who doesn't learn lisp is cursed to reinvent it" or smth like that but for frontend devs).

> 100% this - but good luck fixing that

Its fixable. We just need to add a new, better designed rich text input element to the web.

Plenty of editors do separate the input and output into different DOM nodes. Google Docs does this, as does Coda.
sure, but even if you want people to just be able to italicize or bold their input and nothing else, you're either stuck with the bludgeon that's contenteditable, or you can have fun reimplementing half of the countless nuances of font rendering on canvas. there really should be some sort of better option.
...or add instructions to "have text between _underscores for italics_ and **double stars for bold** - just adopt a subset of markdown and be done with it - we need to stop catering to humanity's lowest common denominator.
Rich text isn't just bold and italics. It also (often) supports tables, embedded images, links, fonts + font sizes and colors. Markdown supports a poor subset of that stuff.

And its not just for "humanity's lowest common denominator". I love being able to copy+paste a table into an email and have it show up correctly. Today's convention where I have to screenshot something to show it to you is horrible.

Markdown isn't the right tool for every job. Would you use markdown to typeset your resume? To write an academic paper?

Lots of popular applications - like Notion or Google Docs - need something more powerful than markdown. And the current javascript APIs are failing them.

At some point you need more than what markdown gives you. I know, encountered it recently.

The formatting equivalent of... "640k ought to be enough for everybody".

sure but that sucks, especially on mobile. doubly so if you try to do anything mildly more advanced like embed images.
My prediction is that in 2-3 years we will see WASM being used to run desktop class rich text editors in a canvas (much like Google Docs now does), and an open source framework will appear that enables it.

I would even go as far as suggesting WebKit compiled to WASM would be better than trying to make a rich text editor based on contenteditable work everywhere, once you have it working in one version it runs “everywhere”. Although accessibility is going to be the main disadvantage of such system.

The other downside is that text editing interacts with a lot of native OS controls. For example, iOS needs to use the OS's keyboard (which is configured outside the web browser). And the user should be able to tap-and-hold to select text just like they can in other webpages and apps.

This sort of behavior is basically impossible to reimplement in javascript for every platform in a way that feels "native".

Yes, maybe my 2-3 years is optimistic. I think ultimately it’s JS APIs for the native controls that need to be exposed rather than better contenteditable.
Accessibility APIs exposed via javascript? Sure. But all the other APIs you'd need would be a mess.

For example, what would an API look like to support international character combining? Or drag-select on mobile? Or custom keyboard inputs? How do you respect the user's OS level configuration - Like system-wide emacs shortcuts in macos, weird keyboards on samsung phones or overridden fonts for people with reduced vision?

Adding javascript APIs for all this stuff would hurt privacy on the web, because this data is full of entropy for anyone doing fingerprinting.

And it wouldn't even solve the problem. You'd still need user code (JS/WASM) to correctly implement platform-specific behaviour for everything to feel native. Thats a janky ride every time.

I'd much rather web browsers to just expose each platform's native rich text editing controls. They work great already. They already do accessibility, and have full support for the platform's native text editing controls and behaviour. And there's a precedent for this sort of thing - web browsers have had INPUT elements forever.

The thing is, letting the user format some text as bold isn't really enough for many use cases. You need custom block and inline elements, such as tables, highlighted code or formulas.
It won't be simple or easy, but this sounds like a solvable technical problem. Its solvable today if you're willing to use hacky javascript.

Inline blocks should probably be embedded DOM nodes or something - just like contentEditable does today. Then they can be themed & styled like everything else - using CSS.

There's some complex UX patterns to manage - like deciding what should happen if the user hits backspace when their cursor is right after an inline block element. If the user uses the keyboard to move the cursor "past" an inline block element, should the cursor enter the element or skip it?

This sounds like exactly the sort of problem that the web standards committee is designed for.

I'm dreading Canvas + WASM. Sites that intercept scroll and touch events with Javascript are already rage-inducing to use, especially on mobile. I can't imagine that taking even more of the UI interaction away from the browser will improve anything.
So am I, however it's really going to shine with things like SQLite for client side data storage, and for complex graphics client side by using desktop class graphics libraries.

Skia have a WASM port, I could see it being an amazing base for a graphics/rich-text product: https://skia.org/docs/user/modules/canvaskit/

I think the difference is WASM will be really good for content creation (see Figma), but we really don't want it for content consumption - unless its seriously clever visuals or background data processing.

We already have egui which is pretty awesome.

Not great for accessibility though (yet)

https://github.com/emilk/egui

That's because there's no standard for rich text at all, everything is either MS Word, some other bespoke proprietary format, or a kludge on a kludge on a kludge.

In part that's due to the shortcomings of HTML itself.

And no, markdown a.k.a. whatever the interpreter accepts is decent enough at what it does but it's not what we are looking for here.

The agents can implement any input mechanism. If I'm writing a text-based browser, I should be able to accept mark-down. Chromium can do whatever fancy thing they please.

For output (the value of the field), HTML is good enough already.

I don't see the problem.

Office Open XML (e.g., *.docx) is an ECMA and ISO standard format.

https://en.wikipedia.org/wiki/Office_Open_XML

How well does that render on webpages?
Check out office 365 on the web opening office file formats lik docx and pptx. I'd say it renders pretty frigging amazingly. Every other week I am amazed and want to buy more msft shares.
So many rich text formats? Ridiculous! We need to develop one universal standard that covers everyone's use cases.

[j] https://xkcd.com/927/

I never said anything about text formats though. Go crazy as the input, give me HTML as the output. I can sanitize it any way I want on the server-side anyway.
Sanitize when rendering the HTML, all other paths lead to hell.

I agree with granddaddy, the web just didnt cater for this with all the XSS, XSRF etc shennigans.

We're left with everyone implementing hacks, or in some cases, getting it right. Mud pie. Slap on an extra dollop.

> Sanitize when rendering the HTML, all other paths lead to hell

I didn't mean mangle user input when storing. I mean you can do that if you want to parse it and store it as a semantic subset to deliver to the devices that can't render HTML (yes they exist), but I digress.

You can sanitize any piece of HTML to a meaningful subset when rendering (well, before render, if you are doing on the server-side) with virtually any language by choosing among many solid libraries.

The problem with asking browsers to implement it is that every browser will implement it differently, and almost everyone will need some extra functionality not covered by browser implementations, so almost everyone reimplements it from scratch (or use one of those reimplementations) anyway. It’s the same story with <audio controls>, <video controls>, <select>, etc.
that's because "rich text" is at odds with everyhting the web is supposed to be... like separating structure and semantics from styling with html and css etc.

sure, the "semantic web" failed, but that's the history

you were not supposed to be able to control how content looks like on the web, this was supposed to be the choice of the browser customized by the user (eg. a user could set his browser to render sci-fi topics web pages text with font X and financial news with font Y min-size Z etc.)

"web design" is something that should't have ever existed or had any reason to even be a concept...

it all went off-the rails in the real world ofc, but this was the real context and dream of the web and its tehnologies as they were designed and standardized

> that's because "rich text" is at odds with everything the web is supposed to be

I respectfully disagree. Rich text editing is more about being able to add semantics than making giant red blinking text. I want to explicitly add paragraphs, emphasis, links etc. without learning a new input method for every website.

A paragraph is a different text box. Sure, a standard method to work with links and emphasis would help, but that's about where we should stop.

A bulleted list is [add new element] > [buleted list] -> a container with text fields for each bullet (+ add/remove buttons) appears.

Same for table, a proper grid with separate inputs for all cells (see modern Confluence).

So you are suggesting to implement more stuff with scripting (otherwise that would be horrible to use with page reloads in between, no?). Why go all the trouble when HTML can store all the semantics just fine?

I also use Confluence via plain HTML because their editor sucks, especially when copy/pasting. They have something even worse than MS Frontpage but implemented in JS, and my subconscious keeps me wanting to switch to CoffeeCup.

I suggest accepting that we need a way to capture arbitrary structured text (lists/tree/graphs of text fields) that will then be rendable either as paragraphs or as tables or as collapsible trees or even navigable graph maps or whatever, depending on context and device and never under the full control of the content producer.

And realistically speaking, we'll never be able to standardize on inputing lists/tables/trees/graphs-of-text-cells anytime soon (ever?), so yeah, scripting is the only way.

This is why the web evolved to what it is, because the problem is always another different and more complicated problem than we thought it was, and then people never agree on solutions, so we have to say "anything goes", so it's scripting and web apps for anything about creating or inputing content.

Sure, you can have nicer standard ways for displaying context, but they'd alway need to be decoupled from the producing of content and be outside the control of the content producer ...because if you couple them you just get the heavy scripting spill out to the display side too, and that's how you get a horrbly slow and heavy javascript-only SPA frontend.

Accept heavy scripting on the content creation / input side, leave the display separate and accept that you cannot control it if you want it to stay nice and clean.

P.S. Confluence had some upgrades recently, and they're newish version is quite goo, especially table editing is dream imo :P (compared to the horror of their prev version) ...it's not Notion, but it's productively usable now :)

Confluence puts all of that in a single text box, AFAIK.

Speaking of which, anyone know of a library that allows that kind of control and extensibility? Specifically thinking custom render items that aren't boxes, tables and more text. Something like @mention but more complex.