Hacker News new | ask | show | jobs
Ask HN: Is a JVM/CDP based browser agent stack fundamentally a bad idea?
1 points by galaxyeye 190 days ago
Hi HN,

We built a very early prototype: a Browser-Agent/browser-automation runtime using Kotlin/JVM and raw CDP. Before investing further, we’d like advice from anyone who has worked on browser agents, AI browsers, large-scale automation, crawling, browser farms, or who has deep knowledge of Chromium/CDP.

We ourselves suspect many of our design assumptions may be flawed, so sharp criticism is very welcome.

---

TL;DR

We’re building an open-source runtime:

• AI planning/reasoning/logic lives on the JVM

• Browser actions are driven via raw CDP

• High concurrency via Kotlin coroutines

• A small ML agent learns page structure

But we’re not sure any of this is actually meaningful. Feedback—especially negative feedback—is appreciated.

---

1. JVM + CDP: possibly the wrong abstraction layer AI planning/reasoning/logic is on the JVM; browser actions are sent through CDP.

Some doubts we cannot resolve internally:

• Is the JVM too heavy for this domain? Will GC and scheduling cause tail latency?

• Is CDP inherently unsuitable for high-throughput automation?

• Does nobody actually need a JVM-native browser agent?

• Would Go/Node/Python be more sensible choices?

If the answer is “no, this is the wrong direction,” we’d really like to hear it.

---

2. High-concurrency runtime: likely to fall apart in real workloads

We’re trying to push single-machine throughput on real, complex pages by relying on:

• Kotlin coroutines

• Minimizing DevTools round-trips

• Raw CDP with multi-tab concurrency

But our doubts are even larger:

• Can Chromium realistically survive this scale?

(render-process contention, GPU-thread limits, compositor stalls, etc.)

• Are multi-tab workloads doomed to event interference, reordering, and deadlocks?

• Will CDP scheduling become the true bottleneck?

• Is raw CDP unavoidably more brittle than Playwright?

If you’ve seen similar attempts fail, we’d especially like to know how they failed.

---

3. Non-LLM page-structure learning: probably not generalizable

We built a small ML module to avoid calling an LLM every time we parse HTML.

It works well on e-commerce pages, but we strongly suspect it will break elsewhere.

Concerns:

• Will it fail outright on news, forums, SaaS dashboards, and other domains?

• Has anyone built DOM-structure-learning systems and then abandoned them? Why?

• Is the long tail of the web fundamentally hostile to non-LLM approaches?

Failure stories are particularly valuable.

---

4. Some questions we have zero confidence about

• Does the world actually need yet another browser-automation stack?

• Do “Browser Agents” have long-term practical value at all?

• Do coroutine-style concurrency models provide real benefits under heavy CDP I/O?

• Should we drop the “agent” layer entirely and just build a runtime?

• What fatal issues exist around resource isolation, multi-tenancy, event storms, or long-tail page behaviors?

• Do all high-concurrency browser runtimes eventually die for the same reasons?

If the answer is “yes, stop now,” we’d prefer to know early.

---

Prototype status

We’ll open-source a very early version (missing docs, missing examples, and possibly flawed designs).

Known issues include:

• Deadlocks on certain complex sites that are hard to reproduce

• CDP event reordering under high concurrency

• Worse-than-expected memory behavior

• Structure-learning module is inaccurate on non-e-commerce pages

If you’ve built systems with heavy browser interaction, automation, data extraction, or treating the browser as a runtime, we’d love to hear about the bottlenecks you hit—so we don’t optimize toward the wrong direction.

---

Finally

Any single sentence of criticism may save us months.

— Browser4 Team

1 comments

Open source it and you'll get all the feedback you desire.
We appreciate your interest and look forward to open-sourcing the project in a few days.