I've built several programmatic video generation systems for work over the past few years (ads, social videos, automated clips, etc.), and I kept running into the same frustration:
> Every project ended up reinventing a slightly different DSL for timelines, layers, animations, and transitions.
Despite very different use cases, the code always converged into the same patterns:
- layout + timing
- repeated elements over time
- imperative glue code to manage state and sequencing
Meanwhile, web developers already have decades of experience solving similar problems with HTML, CSS, and the DOM — just not over time.
So as an experiment, I started building htmlv:
> an HTML-inspired markup language for video, where the DOM exists along a timeline instead of an infinite vertical scroll.
# Core idea
- Time-based layout instead of vertical layout
- A temporal DOM where repeating elements extend time, not height
- Reuse familiar concepts: HTML structure, CSS styling, JavaScript-driven DOM updates
- Fixed viewport (aspect-ratio aware), closer to video than documents
This is not meant to replace video editors or After Effects.
The target is code-first video generation where:
- content is data-driven
- layouts are reusable
- engineers (not motion designers) own the pipeline
# Why I'm posting
Before investing more time, I'd really like feedback from people who've:
- built video pipelines
- designed DSLs
- worked on media tooling
- or have strong opinions about why this is a terrible idea
Questions I’m wrestling with:
- Is HTML a fundamentally bad mental model for time-based media?
- Does this become unmaintainable at scale?
- Am I underestimating how different “time” is from “layout”?
- Are there existing tools or standards I should study more closely?
I’m not looking for validation — criticism is very welcome.
If this is doomed, I’d much rather know why early.
Thanks in advance for any thoughts, advice, or brutal feedback.