Hacker News new | ask | show | jobs
by Const-me 1957 days ago
I used and implemented immediate mode GUIs, and I think it’s only good for simple stuff.

Accessibility is hard. The OS needs to know visual tree to be able to conclude things like “this rectangle is a clickable button with Hello World text”.

Complex layouts are hard. Desktop software have been doing fluid layouts decades before the term was coined for web apps, because different screen resolutions, and because some windows are user-resizable. Layout can be expensive (one reason is measuring pixel size of text), you want to reuse the data between frames.

Animations are hard. Users expect them because smartphones introduced them back in 2007: https://streamable.com/okvhl Note the kinetic scrolling, and soft “bumps” when trying to scroll to the end.

Drag and drop is very hard.

2 comments

Apart from accessibility, isn't this mostly caused by the "immediate mode API" versus "immediate mode implementation" confusion I described in another sub-thread?

There's nothing in the idea that forbids immediate mode UI frameworks from keeping any amount of internal state between frames to keep track of changes over time (like animations or drag'n'drop actions), the difference to traditional UI frameworks is just that this persistent state tree is hidden from the outside world.

Layout problems can be solved by running several passes over the UI description before rendering happens.

For accessibility, the ball is mainly in the court of the operating system and browser vendors. There need to be accessibility APIs which let user interface frameworks connect to screen readers and other accessibility features (this is not a problem limited to immediate mode UIs, but custom UIs in general).

> There's nothing in the idea that forbids immediate mode UI frameworks from keeping any amount of internal state between frames to keep track of changes over time

When you call ImGui::Button("Hello World") it has no way of telling if it’s the same button as on the previous frame, or a new one. You’re simply not providing that data to the framework.

It’s often good enough for rendering, they don’t really care if it’s an old or new button as long as the GPU renders identical pixels. When you need state however (animations certainly do), the approach is no longer useful.

A framework might use heuristics like match text of the button or relative order of things, but none of them is reliable enough. Buttons may change text, they may be dynamically shown or hidden, things may be reordered in runtime.

> Layout problems can be solved by running several passes over the UI description before rendering happens.

You’re correct that it’s technically solvable. It’s just becomes computationally expensive for complex GUIs. You can’t reliably attach state to visual elements, gonna have to re-measure them every frame.

> There need to be accessibility APIs which let user interface frameworks connect to screen readers

Windows has it for decades now. For Win32 everything is automatic and implemented by the OS. Good custom-painted GUI frameworks like WPF or UWP handle WM_GETOBJECT message and return IAccessible COM interface. That COM interface implements reflection of the complete GUI, exposing an objects hierarchy. I’m not saying it’s impossible to do with immediate rendering, just very hard without explicit visual tree that persists through the lifetime of the window.

> When you call ImGui::Button("Hello World") it has no way of telling if it’s the same button as on the previous frame

ImGui does have stable widget ids, otherwise many features which require keeping state across frames wouldn't work. In case of your button example, the widget id is hashed from the label string "Hello World":

https://github.com/ocornut/imgui/blob/61b19489f1ba35934d9114...

This is just the default behaviour, there's also a convention to define the id separately from the visible label string behind a '##':

https://github.com/ocornut/imgui/blob/61b19489f1ba35934d9114...

Of course other immediate mode UIs might have a different way to define widget ids.

> When you call ImGui::Button("Hello World") it has no way of telling if it’s the same button as on the previous frame, or a new one.

Can you solve this by adding stable IDs to the API, so you'd call `ImGui::Button("my-button-id", button_text)`?

To consistently assign these IDs you gonna have a tree on your side of the API. You’ll then have two visual trees, one on your side on the API, another one on the framework’s side of the API. And then you gonna be debugging things.

Who and when should destroy backend visual nodes? If you won’t use an ID in some frame, does it indicate the backend should drop the related state? What if an animation is still playing on the element in question? What if the missing control re-appears later i.e. was only hidden temporarily? What should backend do if you reuse same ID for different things?

One can implement simple stuff using any approach, immediate GUI is often the simplest of them. It’s when use cases are complicated you need a full-blown retrained mode OO framework. Win32, HTML DOM, MS XAML, QT, UIKit are all implemented that way, that’s not a coincidence.

these APIs are UI Automation on windows and NSAccessibility on macos
When building more complicated UIs, an architectural pattern I have often found useful is to have two distinct levels of state/data behind the rendering code, so my hierarchy looks something like this:

1. Application data

2. Presentation data

3. Presentation rendering

The first is the single source of truth for your application state, what we often call a “model” or “store”. It’s where you represent the data from your problem domain, which is usually also the data that needs to be persistent.

The second is where you collect any additional data needed for a particular way of presenting some or all of your application data. This can come from the current state of the view (list sort order, current page for a paginated table, position and zoom level over a map, etc.) or the underlying application data (building a sorted version of a list, laying out a diagram, etc.) or any combination of the two (finding the top 10 best sellers from the application data, according to the user’s current sort order set in the UI). This is often a relatively simple part of the system, but there is no reason it has to be: it could just as well include setting up a complicated scene for rendering, or co-ordinating a complicated animation.

The final part is the rendering code, which is a translation from the application and presentation data into whatever presentation is required. There isn’t any complicated logic here, and usually no state is being maintained at this level either. The data required for rendering all comes ready-prepared from the lower layers. Any interactions that should change the view or application state are immediately passed down to the lower layers to be handled.

The important idea is that everything you would do to keep things organised around the application data also applies to the presentation data. Each can expose its current state relatively directly for reading by any part of the system that needs to know. Each can require changes of state to be made through a defined interface, which might be some sort of command/action handler pattern to keep the design open and flexible. Each can be observable, so other parts of the system can react to changes in its state.

It just happens that now, instead of a single cycle where application data gets rendered and changes from the UI get passed back to the application data layer, we have two cycles. One goes from application data through presentation data to rendering, with changes going back to the application data layer. The other just goes from the presentation data to the rendering and back.

I have found this kind of architecture “plays nicely” with almost any other requirements. For example, data sync with a server if our GUI is the front end of a web app can easily be handled in a separate module elsewhere in the system. It can observe the application data to know when to upload changes. It can update the application data according to any downloaded information via the usual interface for changing the state.

I have also found this kind of architecture to be very flexible and to fit with almost any choice of libraries for things like state modelling, server comms, rendering the actual view, etc. Or, if your requirements are either too simple to justify bringing in those dependencies or too complicated for a ready-made library to do what you need, you have a systematic overall structure in place to implement whatever you need directly, using all the same software design and scaling techniques you normally would.

What you talking about is very similar to MVVM as implemented in various MS XAML frameworks: https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93vie... IMO that’s very good approach to the problem.

> laying out a diagram

I think layout belongs to the views (you call them presentation rendering).

View models provide data to be visualized, but it’s rarely a good idea to compute pixel positions, or handle low-level input there. These things are coupled with views too much.

Similar to animations. Unless the app being developed is a video editor or similar where animation is the core functionality, animations don’t affect models nor view models, they merely prettify GUI state transitions or provide progress indication. No need to wire them all the way into view models. Some frameworks even run them on a separate thread called “compositor thread” so they play fine even if the main thread is blocked or starved for CPU time.

What you talking about is very similar to MVVM

At first when I adopted this architectural pattern, it was a development of classic MVC so already used the terms “model” and “view”. I actually did adopt the term “view-model” for what I’m referring to here as the presentation data layer. After all, it was the connection between the view and model, so what else should it be called? :-)

However, while there certainly are similarities between the architecture I described above and MVVM — the VM can collect data from the Model and reformat it for the View, and the VM might manage some aspects of the view state — they also differ in some important ways.

Most significantly, MVVM has a linear V–VM–M relationship between the components. The V and M never communicate directly in either direction.

In the architecture I described, the rendering code might depend on both state layers, accessing both application state directly and derived or view-specific data from the presentation state. This avoids any need for boilerplate in the middle layer if nothing special needs to be done with the application data before it’s used for rendering.

Likewise, in response to events triggered from something in the rendered output, messages might be sent to the application and/or presentation state layers so they could update accordingly. The distinction is always clear: any event that can affect application data gets sent there, while anything that affects view state goes to the presentation data layer. In practice, it’s unusual for a single UI event to affect both, so usually only one message is needed. (There is an implicit assumption being made here that the starting point for handling events triggered by user interactions is defined as part of the rendering, which is not necessarily always the case, but let’s gloss over those details for now or this comment will get insanely long.)

In MVVM, there may also be direct two-way data-binding between the V and VM. The architecture I described never assumes that kind of 1:1 relationship between something in the rendered output and the underlying data.

I think layout belongs to the views (you call them presentation rendering). View models provide data to be visualized, but it’s rarely a good idea to compute pixel positions, or handle low-level input there. These things are coupled with views too much.

This is another place where I think the responsibilities in the architecture I described are allocated differently. I want any non-trivial calculations to be in the presentation data layer, and I want the presentation rendering layer to be as dumb as possible. This separates data processing from I/O, which is something I favour as a general principle for organising programs.

In this case, it has the specific advantage that you might have very different strategies for testing them. The presentation data layer is pure computation and can be easily unit tested. The I/O is all about rendering and, in some cases, simulating actions on the rendered output that trigger some sort of message to be sent, so for this you might want some sort of snapshot-based testing or simulated UI environment.

You might also want to incorporate totally different third party libraries or depend on different platform APIs to implement each step. For example, switching out one UI rendering library for another or updating to integrate with some new-style platform API probably shouldn’t affect any of the data (including presentation data like diagram layouts or animation logic) unless we’re specifically shifting responsibility for something like handling animations to a dependency rather than our own code or vice versa.

Some frameworks even run them on a separate thread called “compositor thread” so they play fine even if the main thread is blocked or starved for CPU time.

Sure. One of the nice things about the architecture I described is that it’s entirely neutral about these things. If your requirements so dictate, there is nothing to stop you implementing a more elaborate design at any level of the system. For example, that might include running time-consuming parts of the presentation data logic in separate threads, having changes in the presentation data triggered by timers or other system events, delegating some aspects to hardware acceleration where available, or using any sort of sophisticated caching or scheduling implementation for better performance.

Much of this would add unnecessary complexity in a simple CRUD application GUI. If you’re building a complex, real-time visualisation of an incoming data stream or you’re implementing a game engine that needs to draw part of a large game world with many moving parts, presumably you might have a different perspective on how much extra complexity in the design is acceptable if it gets you the performance you need.

> The V and M never communicate directly in either direction.

There’s nothing wrong in dropping a layer there.

That intermediate VM layer makes sense when the data being presented is not precisely what’s stored in the model. Good place to implement filters or pagination. Good place to validate user input. However, in other cases one doesn’t need any of that, in which case there’s nothing wrong with data binding directly to models.

> the rendering code might depend on both state layers, accessing both application state directly and derived or view-specific data from the presentation state.

In .NET with MVVM that’s easily doable, expose a property on VM that returns the model, and this will work.

> In practice, it’s unusual for a single UI event to affect both

In the apps I work on this happens all the time. User clicks “do something” button, the model needs to start doing that, in the meantime the view needs to disable that button and show “doing something, please wait…” with a progress bar.

For this reason, I normally handle such higher-level events in VMs and call models from there.

> The presentation data layer is pure computation and can be easily unit tested.

I think strongly typed languages handle 90% of what people usually unit test for, with better developer’s UX. If you change something that broke 33 places of other code, a compiler will tell you precisely which 33 places you gonna need to fix.

> switching out one UI rendering library for another or updating to integrate with some new-style platform API probably shouldn’t affect any of the data

I think that only works when switching to something extremely similar. I did when porting iPhone code to iPad, or WPF XAML to UWP XAML, but these were very similar platforms. UI libraries are complicated; replacing them with something else is huge amount of work.

> including presentation data like diagram layouts or animation logic

I don’t believe these pieces are portable in practice. Both coupled with UI libraries too much. For the diagram, half of the 2D rendering libraries are immediate mode other half are retained mode. Animations are totally different as well, CSS animations in HTML don’t have much in common with visual state manager in XAML.

No doubt there are many different architectures that can work well in this sort of situation. I’m just arguing that, contrary to your earlier comment that I first replied to, an immediate mode UI can be one of them even when the requirements are not so simple.

In the apps I work on this happens all the time. User clicks “do something” button, the model needs to start doing that, in the meantime the view needs to disable that button and show “doing something, please wait…” with a progress bar. For this reason, I normally handle such higher-level events in VMs and call models from there.

I used too strong a word before. As you point out, situations where you’d want to update both the application and presentation data aren’t really unusual. You gave one good example. Another might be controlling a modal UI, like closing a dialog box or moving to the next step of a wizard, when maybe you also want to commit data user has entered in a temporary form to the permanent application state. In these cases, handling a UI event would indeed need to get the relevant information to both of the other layers one way or another. This still easily fits within the architecture pattern I was describing, though.

I think strongly typed languages handle 90% of what people usually unit test for, with better developer’s UX.

I am a big fan of using expressive type systems to prevent defects, particularly as an alternative to huge unit test suites full of boilerplate that still don’t do as good a job.

However, no mainstream type system will verify that for a given list in your application data and a given sorting option chosen in the UI, a function in your presentation data layer has generated the correct top 10 items in order. (No unit test can ever verify that property in general either, of course, but at least we can test some representative example cases.)

I think that only works when switching to something extremely similar. I did when porting iPhone code to iPad, or WPF XAML to UWP XAML, but these were very similar platforms. UI libraries are complicated; replacing them with something else is huge amount of work.

I suppose that depends on what you consider extremely similar. Around 10–15 years ago, when what we call “single page applications” today were starting to gain serious traction, if you wanted an interactive visual representation of your data, you probably used one of the proprietary embedded technologies like Flash, Java or ActiveX. But if you were drawing, say, an org chart, you were still ultimately just putting boxes and lines and text in certain places within the allocated area on the page. Today we have tools like SVG, canvas and WebGL available handle the drawing part, but everything we’d want to display in that drawing might still be the same.

I don’t believe these pieces are portable in practice. Both coupled with UI libraries too much. For the diagram, half of the 2D rendering libraries are immediate mode other half are retained mode. Animations are totally different as well, CSS animations in HTML don’t have much in common with visual state manager in XAML.

I think perhaps we have very different types of animation in mind. From the above, I’m guessing you’re thinking of things like pulsing a button when it’s pressed or a little progress spinner while downloading some data? I’m thinking of things like smoothly moving all of the boxes on that org chart to their new positions, including animating the shapes of the paths between them, when the user does something that moves the chart around. Another example might be animating weather data overlaid over a map graphic through the next few hours when the user presses a “play” button. That is, I’m talking about animations where the data to be shown at each step needs to be determined according to complex requirements, not just playing a simple, predetermined effect that a UI library or platform API might handle for you almost for free anyway.

> immediate mode UI can be one of them even when the requirements are not so simple

Just because you can doesn't mean you should.

> a function in your presentation data layer has generated the correct top 10 items in order

There're too many things which can go wrong with visuals. Incorrect order has smaller probability than a bug in a style somewhere which makes data invisible. For the 2nd reason you want to test that thing somehow else anyway: unit tests don't catch rendering bugs. Some other automated tests can in theory, but in practice maintaining these tests wastes a lot of time i.e. expensive.

> Around 10–15 years ago, when what we call “single page applications” today were starting to gain serious traction

Traction is overrated. Around 20 years ago Outlook Web Access (a server-side component of Exchange server) already was a modern single page application, written in HTML and JavaScript. All the required functionality was already there in IE, and even documented nicely on MSDN.

> everything we’d want to display in that drawing might still be the same.

Totally different. SVG renders vector 2D graphics, WebGL is a 3D GPU API. Try to render something slightly more complicated than rectangle of solid color, and you'll see. One example is rectangle with rounded corners: trivially simple in SVG, relatively hard on GPU. A Mandelbrot set is opposite example.

> animations where the data to be shown at each step needs to be determined according to complex requirements

Doesn't matter if the data is static or not. Once you know the data, you do not want to run your code at 60 Hz (or 240Hz if you're really unlucky) computing functions of time and updating things. It's almost always a good idea to use the GUI framework for playing animations. They do better with rendering and power saving.