| Hi HN, I’ve been experimenting with letting LLMs generate and then continuously modify small business applications (CRUD, dashboards, workflows). The first generation usually works — the problems start on the second or third iteration. Some recurring failure modes I keep seeing:
• schema drift that silently breaks dashboards
• metrics changing meaning across iterations
• UI components querying data in incompatible ways
• AI fixing something locally while violating global invariants What’s striking is that most AI app builders treat generation as a one-shot problem, while real applications are long-lived systems that need to evolve safely. The direction I’m exploring is treating the application as a runtime model rather than generated code:
• the app is defined by a structured, versioned JSON/DSL (entities, relationships, metrics, workflows)
• every AI-proposed change is validated by the backend before execution
• UI components bind to semantic concepts (metrics, datasets), not raw queries
• AI proposes structure; the runtime enforces consistency Conceptually this feels closer to how Kubernetes treats infrastructure, or how semantic layers work in analytics — but applied to full applications rather than reporting. I’m curious:
• Has anyone here explored similar patterns?
• Are there established approaches to controlling AI-driven schema evolution?
• Do you think semantic layers belong inside the application runtime, or should they remain analytics-only? Not pitching anything — genuinely trying to understand how others are approaching AI + long-lived application state. Thanks. |
Real example: in one LLM-powered support tool, a minor prompt tweak changed tone and broke downstream parsers. We fixed it by adding contract tests (expected fields + phrasing constraints) and running batch replays before deploy. Think of LLMs as nondeterministic services you need observability, evals, and guardrails, not just “better prompts.”