Hacker News new | ask | show | jobs
by codedokode 684 days ago
Please let me rant a bit about X11 input APIs.

One of the major flaws in X11 is its poorly designed keyboard input system. When a key is pressed, the keypress event sends a "keycode" - an 8-bit number that references the current layout. This means you're limited to injecting characters that are present in the current layout.

The implications of this design are frustrating. For instance, if you're connecting to a remote system via VNC and the client and server have different keyboard layouts, you'll run into all sorts of issues. Similarly, how do you create an on-screen keyboard that can inject keypresses for characters not available in the current layout? And what if you want to programmatically send some text, but the user has the wrong layout active? It's a mess.

I guess because of this in Debian most of characters in on-screen keyboard didn't work (they "fixed" it by showing only characters present in active layouts instead of fixing the root issue).

The common workaround for this is an ugly hack: you modify the keyboard layout, find unused spots, add desired characters, send key press event and restore the keyboard layout back. See the code: [2]

Also, Wayland which was supposed to get rid of legacy problems, seems to have inherited this ugly design. Also, there seem to be no sane API for managing layouts or switching them programatically, or subscribing to layout change event. Also, you cannot use modifiers like Ctrl to switch layouts because then combinations like Ctrl + C stop working. Keyboard APIs on Linux are broken in the worst way possible since beginning, probably because most developers use only ASCII and do not have experience using multiple layouts.

A better idea would be to allow to send arbitrary Unicode strings and maybe integrate regular input and IME input (input system for typing Asian characters).

[1] https://www.x.org/releases/X11R7.6/doc/xproto/x11protocol.ht...

[2] https://github.com/Zirias/xmoji/blob/master/src/bin/xmoji/ke...

4 comments

> Also, there seem to be no sane API for managing layouts or switching them programatically, or subscribing to layout change event

X11 has XkbMapNotify/XkbStateNotify.

Wayland has a wl_keyboard.keymap event.

> A better idea would be to allow to send arbitrary Unicode strings and maybe integrate regular input and IME input (input system for typing Asian characters).

Not particularly, the difficulty here is that some clients want text input, and some clients really do want key events (e.g. think games where holding W does not really have much to do with the Unicode code point 'w'). This was discussed for a long time, and the current design was decided as the best option.

IME systems do exist and already work just fine; they are integrated client-side. IME systems cannot really be integrated into the protocol since many of them involve custom UI.

That said, there's a proposal for an "input-method" extension which lets you commit text directly, but I don't think anybody is actively championing it. https://gitlab.freedesktop.org/wayland/wayland-protocols/-/b...

> Not particularly, the difficulty here is that some clients want text input, and some clients really do want key events

I understand this point. For this case it makes sense to send both key code (what character does the key map in Latin layout) and translated code (which characters will be printed when key is pressed in current layout). It seems an easier solution than broadcast large structures to every client and let them have each own implementation for translating codes.

Regarding IME, I meant not integrating IME client into a Wayland server but instead unify the API and events that IME uses to insert text with API and events used to notify about regular keypresses.

> X11 has XkbMapNotify/XkbStateNotify.

Yes, it seems I missed XkbStateNotify and XkbLockGroup that can be used to switch layouts (which X11 calls "groups").

Hear, hear.

I hit all those Wayland issues while working on Squeekboard. https://gitlab.gnome.org/World/Phosh/squeekboard

> Similarly, how do you create an on-screen keyboard that can inject keypresses for characters not available in the current layout?

I switched the keyboard layout on the fly, on key press, if needed. That works... mostly. Chromium and Chromium-based apps know better what layout I am using, so they will misinterpret some inputs despite having a key map already. And then you realize that you can't use a physical keyboard at the same time, because key maps go out of sync while keys are pressed on both. I talked to a Wayland dev about having separate keyboards with separate layouts, but the answer was basically "it's an incompatible change, and it's too late to fix this" (it was in an issue tracker, but no link). So the only way to have a non-input-method on-screen keyboard is to limit yourself artificially to the current layout. Which, of course, is an oft requested feature I will never implement.

> A better idea would be to allow to send arbitrary Unicode strings and maybe integrate regular input and IME input (input system for typing Asian characters).

Isn't Mac OS do something like that? I agree this is the way to go. But the stumbling block is - again - that applications like Chromium won't implement this. I created the text-input-v3 protocol some 4 years ago, and it's still basically only used in GNOME.

But with new funding from NLNet I'm gathering a special ops team to push input methods again this year :)

> most developers use only ASCII and do not have experience using multiple layouts.

I'm getting that impression as well after discussing the topic of internationalization on Mastodon: using languages other than English is undervalued by open source devs. I mean, how often do you find variables named in Spanish or Russian in open source software? It's a very anglocentric bubble.

> I'm getting that impression as well after discussing the topic of internationalization on Mastodon: using languages other than English is undervalued by open source devs. I mean, how often do you find variables named in Spanish or Russian in open source software? It's a very anglocentric bubble.

And that's a good thing.

Why is that a good thing? I get the idea that a common language is beneficial, but the flip side is the knowledge and effort of the people who know another language. That's lost due to never being opened (I guess that's more of an indictment of the open source community being not interlinguistic).
So, let's look at this group of people knowledgeable in programming et al, but not english. Sure, this group exists. It's relatively tiny though. English is the language in any (modern, IOW except where it didn't replace latin yet) science, including CS. The vast majority of other on-topic literature is english. It's hard to learn a decent amount of stuff concerning programming without knowing english.

You can't randomly mix languages in source code, and any other choice of common language than english would exclude a LOT more people from the project.

All of this isn't related to i18n/l10n of your application at all. People not knowing english is a much more relevant factor when talking about user interfaces. I actually plan to localize my Xmoji tool eventually, it's just postponed until more important stuff is done (I mean, I assume most people would get together enough english to be able to use an emoji keyboard after all, but of course, especially seaching would benefit from l10n).

I guess my initial reaction was wrong: not having code in non-English languages doesn't accurately represent developer sentiments. There's a lot of translation efforts in open source, but again, this is not a good proxy for the sentiment because we don't know how many translators (who care about non-English) set project direction and design protocols.

Still, an anglocentric bubble diminishes internationalization, and I disagree that it's a good thing.

The Elinks code is in some Eastern European language.
Well good news, literally just did a week ago: https://chromium-review.googlesource.com/c/chromium/src/+/57...
Oh wow, the beginning of a new era.
For Wayland, keyboard input is "out of scope". (edit: Not entirely, just verified it does forward very basic events, but it's a thing wayland doesn't want to handle, while X11 originally had its own keyboard/mouse/etc drivers) What's typically used is XKB (inherited from X11) with a different backend ([edit: raw events]). So yes, in practice, you'll have the same broken design.

> Also, there seem to be no sane API for managing layouts or switching > them programatically

Layouts only exist as "data" as far as the X server is concerned, clients must fetch them and map themselves from the key codes. Libraries like xkbcommon (or even grandfather Xlib) do the job for you. That said, there are APIs to modify the mapping (by publishing messages/events) as you wish. The ugliness is, apart from the fact that you're forced to fiddle with the mapping at all, that you can't guarantee another client will process everything in sequence. It might apply a new mapping before processing all its queued key press events. That's why my code adds delays between fiddling with the mapping and sending the events.

Yes, it's extremely ugly. Still, at least for me, it works. Try it out ;)

And yes, Windows is doing better here, there's a Unicode-flavor of keyboard events available.

I feel like this falls into libinput's scope, but it's going to be a bunch of work.

edit: It does, libinput exposes keyboard events via evdev.