Hacker News new | ask | show | jobs
by snickell 456 days ago
What I'm currently doing is caveman: I ask the LLM to attach a unique id= to every element, and I gave it an attribute (data-use-cached) it can use to mark "the contents of this element should be loaded from the preivous frame": https://github.com/snickell/universal/blob/47c5b5920db5b2082...

For example, this specifies that #my-div should be replaced with the value from the previous frame (which itself might have been cached): <div id="my-div" data-use-cached></div>

This lowers the render time /substantially/, for simple changes like "clicked here, pop-open a menu" it can do it in 10s, vs a full frame render which might be 2 minutes (obviously varies on how much is on the screen!).

I think using HAML etc is an interesting idea, thanks for suggesting it, that might be something I'll experiment with.

The challenge I'm finding is that "fancy" also has a way of confusing the LLM. E.g. I originally had the LLM produce literal unified diffs between frames. I reasoned it had seem plenty of diffs of HTML in its training data set. It could actually do this, BUT image quality and intelligence were notably affected.

Part of the problem is that at the moment (well 1mo ago when I last benchmarked), only Claude is "past the bar" for being able to do this particular task, for whatever reason. Gemini Flash is the second closest. Everything else (including 4o, 4.5, o1, deepseek, etc) are total wipeouts.

What would be really amazing is if say Llama 4 turns out to be good in the visual domain the way claude is, and you can run it on one of the LLM-on-silicon vendors (cerebrus.ai, grok, etc) to get 10x the token rate.

LMK if you have other ideas, thanks for thinking about this and taking a look!