Hacker News new | ask | show | jobs
Show HN: Rotunda - A browser built for agents with simulated typing (github.com)
13 points by icyfox 40 days ago
Hi HN! Pierce here.

Rotunda is a firefox fork primarily intended for agent use, which I’ve been hacking on nights/weekends.

There was a [lengthy](https://news.ycombinator.com/item?id=48024859) discussion last week on how expensive computer use models are. The cost is going to drop eventually, but I think on some level it's still usually the wrong primitive. The web gives us access to beautiful structured formats, plaintext, etc... why throw that away if we don't have to?

I realized at some point that for 99% of automations I just want agents to be able to control my Chrome instance. But that’s easier said that done: CDP (the Chrome automation protocol) leaks a ton of state about being programmatically controlled, either by toggling window attributes or by running `page.evaluate()` commands right in the page context. Plus if you look at an automation running it's pretty obvious what happens: the mouse jumps around, fields are filled instantly, etc.

Rotunda tries to fix this. Its standout features:

- Realistic simulation of mouse movements and keyboard commands, powered by a trained RNN on my own timing patterns from the last week. (still feel weird about opting-in to a key logger but whatever)

- Doesn’t lie about its host specs, only fibs about some client side details. Stealth browsers are too easy to flag statistically when you’re adding noise to canvas pixels or audio pipelines.

- It runs on your local device with a CLI or Playwright API accessible to Claude, Codex, or whatever your harness-de-jure today looks like.

- Patches modern Firefox (150) with an agentic harness to keep this updated over time

MPL-2.0 on GitHub: https://github.com/monkeysee-ai/rotunda

Longer writeup on the design choices: https://pierce.dev/notes/a-browser-for-agents

Also check out the demo on the site! https://www.rotunda.sh/

Pretty excited by how this turned out but we’re still super early. Give it a try and please flag any issues!

2 comments

If you're training this on your own timing patterns, do you worry that eventually captchas will pick up on this and you yourself will no longer be able to prove you are a human?
Not particularly. I'm not yet convinced people's mouse movements are unique enough to our identity that they're useful as a fingerprint, whereas it's very easy to classify whether something looks bezier or looks human.

Eventually I'm hoping to collect enough data here to train a biased decoding model, so you could input some randomized personality vector (which implicitly encodes slow movement, jerky motion, trackpad, mouse, etc) and have that impact the RNN generation. So in theory there would be infinite combinations from the larger subspace we're sampling from.

I think the way to do it is to think of this as your own browser that can also be used by agents (with granular permissions). I use the browser for 5h today and my patterns then inform another 12h of agent use
Is there a way for the browser viewport width to respond to the window resizing (in headed mode)?
Could look into addressing this. What are you trying to achieve?