Is HTML-like markup a bad idea for programmatic video generation?

Hi HN,

I've built several programmatic video generation systems for work over the past few years (ads, social videos, automated clips, etc.), and I kept running into the same frustration: > Every project ended up reinventing a slightly different DSL for timelines, layers, animations, and transitions. Despite very different use cases, the code always converged into the same patterns: - layout + timing - repeated elements over time - imperative glue code to manage state and sequencing Meanwhile, web developers already have decades of experience solving similar problems with HTML, CSS, and the DOM — just not over time. So as an experiment, I started building htmlv: > an HTML-inspired markup language for video, where the DOM exists along a timeline instead of an infinite vertical scroll.

GitHub: https://github.com/xxatsushixx/htmlv

# Core idea - Time-based layout instead of vertical layout - A temporal DOM where repeating elements extend time, not height - Reuse familiar concepts: HTML structure, CSS styling, JavaScript-driven DOM updates - Fixed viewport (aspect-ratio aware), closer to video than documents This is not meant to replace video editors or After Effects. The target is code-first video generation where: - content is data-driven - layouts are reusable - engineers (not motion designers) own the pipeline

# Example Structure <!DOCTYPE htmlv> <html> <head> <title>Sample Video</title> <link rel="stylesheet" href="styles.css"> <script src="script.js"></script> <meta name="seed" content="12345"> <meta name="framerate" content="30fps"> <meta name="compile-mode" content="precompile"> </head> <body> <scene style="time-length: 10s; scene-transition: fade 2s;"> <text class="title">Welcome to htmlv</text> </scene> <scene style="time-length: 15s;"> <video src="background-loop.mp4"></video> <scene> <text class="subtitle">Creating videos with code</text> </scene> </scene> </body> </html>

# Why I'm posting Before investing more time, I'd really like feedback from people who've: - built video pipelines - designed DSLs - worked on media tooling - or have strong opinions about why this is a terrible idea Questions I’m wrestling with: - Is HTML a fundamentally bad mental model for time-based media? - Does this become unmaintainable at scale? - Am I underestimating how different “time” is from “layout”? - Are there existing tools or standards I should study more closely? I’m not looking for validation — criticism is very welcome. If this is doomed, I’d much rather know why early.

Thanks in advance for any thoughts, advice, or brutal feedback.