Hacker News new | ask | show | jobs
by keepamovin 958 days ago
I totally agree on all points, especially around what AI means for this.

I'm kind of in a happy accident situation because I was working on something for RPA, which then became a layer that was factored as its own product, but now might be able to come full circle as a result of AI.

Essentially this layer can function as a "delivery medium" for RPA agent creation, that you can use on any device without download. However, as it has many others uses I've been working on those, but I've been seeking a great reason to get back into RPA.

I have a cool idea to leverage human-guided AI creation of data maps and action tours for RPA, but similar to what you say, unless great care is taken you can end up with a brittle approach. Also, as the market has been quite saturated many reasonable approaches, I just haven't felt compelled.

Yet now I think the possible merging of GPT level AIs with browser instrumentation to deliver an augmented way to browse the web makes that incredibly compelling.

So I'm incredibly thrilled that I have this happy accident of BrowserBox^0 (the factored out layer originally from RPA work above) which provides a pluggable/iframe-emebeddable interface for remotely controlling a headless browser. So now I want to look at unifying BrowserBox with this kind of GPT driven exploration.

It's even cooler, because, as BB enables co-browsing by default (multiplayer browsing) and turns the browser into a "client-server" architecture, I can see plugging in GPT-4V as a connecting client with some kind of minimal API affordance for it to use would, like the very cool vimium keyboard-enabled browsing in the OP, would be such interesting project to try!

We're open source so if you want to check us out or get involved in this quest, come say hi, maybe get involved if you're game!

0: https://github.com/BrowserBox/BrowserBox

1 comments

I have watched your project for a while as a possible option for embedded browsers for XR applications like WebXR but the high licensing cost was a factor and solutions like Hyperbeam or Vueplex in Unity have been possible. Defiantly agree that multimodal LLM integration is a huge opportunity and multiplayer browsing with AI in realtime is a super cool idea if you package it right.
Hi jimmySixDOF thank you for the kind words and the attention on our project! :)

Regarding pricing we have heard that feedback over time and gradually adjusted our licensing costs. It should now be much more affordable as it is targeted towards large deployments, with decreasing cost and increasing value at scale.

If you'd like to send an email with any thoughts on our current prices on https://dosyago.com to cris@dosyago.com I'd highly value it!

Your idea of WebXR and embedding within Unity is very interesting, and I think it could be a fit.