Show HN: Automated, hosted UI tests with AI test discovery (octomind.dev)

Y	Hacker News new \| ask \| show \| jobs

Show HN: Automated, hosted UI tests with AI test discovery (octomind.dev) (octomind.dev)

13 points by marcme 1000 days ago

I am Marc, the co-founder of octomind.dev. I am a former chess player turned machine learner turned entrepreneur. I always wanted to develop products based on the latest AI research. I did it at my previous startup with computer vision and now at octomind.dev with LLMs/agents.

Octomind is an AI-powered end-to-end testing platform for web apps. You enter your URL, and we will auto-generate, run, and auto-maintain UI tests.

My co-founder Daniel and I used to work together in our previous AI startup. We had never-ending problems with end-to-end testing. We tried many frameworks and tools and even looked into outsourced QA services. Nothing that wasn’t a pain for our dev teams to write and maintain wouldn’t hold back releases or be reasonably priced.

We have a decent level of experience in baking AI into software products. It made sense to apply it to e2e testing.

Whenever a test target is set, our AI agents discover user flows and interactions leading toward a desired user goal. This serves as the foundation for creating deterministic test cases. We use it for automated test case discovery and are working on expanding it to maintenance so that broken tests can self-heal.

We’ve built Octomind on top of Playwright, which is imho the most advanced framework for end-to-end testing today. We don’t need access to your code, and the generated Playwright code is yours. We run and host the tests. We report the test results in our app or directly in PR comments if you integrate Octomind with your CI pipeline. To help with debugging, we’ve integrated Trace Viewer and created Debugtopus - an open-source tool for local debugging. (see https://github.com/OctoMind-dev).

We’re still in an early access mode, so any feedback from the community would be appreciated. Try us out for free. We want to keep the freemium option even as we grow.

2 comments

welder 1000 days ago

This seems like a good enterprise use-case for AI. We had automated visual diffs at my last company, but it was very noisy. It did help to confirm my change didn't visually impact some other code I wasn't aware of, but we still had to write intelligent UI unit tests.

Writing those UI tests would save a ton of time, and if the AI can determine intent of the code change to correctly fail/pass tests that would eliminate all my debug work, leaving just the feature code to me. Hopefully AI doesn't take that away from me too ;)

link

ma_za 1000 days ago

I think we are still a bit of ways away from AI taking over debugging. From my experience playing with GPT it quickly starts failing if the questions get as complex as debugging an intricate problem.

However, if this can take some effort away from me having to do manual test plans, I'm onboard.

link

mirman 1000 days ago

what sorta moat do you have now that openai’s got eyes?

link

marcme 1000 days ago

We expected openai's announcement of the multi-modal models and built our platform to complement it. In modern LLM/agents, even openAI and Google have trouble defining their moat. So, as a bootstrapped startup, we are in no position to postulate a moat.

Instead, we aim to achieve a growing advantage by having considerable user adoption that exposes the agents and platform to the ever-growing long tail of unexpected edge cases. And boy, do we have weird edge cases in web design worldwide. But this is good and helps build up robustness. We have one agent per critical workflow to accommodate that. Despite this, our sign-in agent struggles with icon depictions. We purposefully didn't debug it or invest in computer vision, as we knew multi-modal was around the corner.

Instead, we invested in traditional engineering and ensured the table stakes and workflows were robust and scaled well. We use it ourselves multiple times per day on every pull request to ensure it feels natural.

But the most important thing is to find the users who help you build the leading tooling. Feedback is our oxygen, and we make sure there is a feedback button two clicks away.

Imho, this accumulation of smaller gains is as close to an advantage as possible, but what's your opinion on it?

link