Hacker News new | ask | show | jobs
by transistorfan 959 days ago
At my work there are a large contingent of people who essentially do manual data copying between legacy programs (govt), because the tech debt is so large that we can't figure out a way to plug these things together. Excited for tools like this to eventually act as a layer that can run over these sort of problems, as bizarre a solution as it is from a compute perspective
14 comments

A long, long time ago I worked on a small project for a major multinational grocery chain.

I made them a tool that parses an Excel file with a specific structure and calls some endpoints in their internal system to submit the data.

I was curious, so I asked how they are doing it currently. They led me to a computer at the back of their office. The wallpaper had two rectangles, one of them said MS EXCEL and the other said INTERNET EXPLORER. Then the person opened these apps, carefully positioned both windows exactly into those rectangles and ran some auto-clicker - the kind cheaters would use in RuneScape – which moved the cursor and copied and pasted the values from the Excel into the various forms on the website.

Amazing.

I worked with a client who used a multi-millon dollar system for moving goods automatically into packaging stations. The system was built and maintained by a major european company. All the data was transfered automatically between systems normally, but one day, for some reason, there was an internal communication error inside the machine which caused a lot of packages to be sent without being recorded as such.

Now normally we would just have contacted the company and asked them for a data extraction so we could cross-reference the data. But since it wasn't clear who was at fault, and we knew it would take weeks for that extraction, we looked for an internal solution first.

Now there was a subsystem in the machine that worked only in Internet Explorer, with an old authentication scheme, that we could use to see the information we needed, so I, being the only person in the team without formal analysis training but having made my way there from a clerk job, knew exactly what to do.

I fired up the old IE, Excel, wrote in 5 minutes a VBA script that did exactly what you described, click there copy that etc, and 30 minutes later we had our extraction, and resolved the issue completely before the packages were even shipped.

All hail Excel.

For all its flaws as a programming language, VBA made an excellent bodging language and I salute your expedient field hack.
I wonder if it used something like AutoIt[0]. I remember using it at one of my more boring co-op jobs about 20 years ago to automate moving data between a spreadsheet and some obscure database product.

[0] https://en.wikipedia.org/wiki/AutoIt

Funny that you and others on here don't seem to realize that literally everybody who uses the internet has the exact same data entry problem all the time. Blame it on "old software", but how about the entire internet?

copying (or in most cases even worse: re-typing) form data from one location on the screen into yet another webform.

Username, password, email address, physical address, credit card info etc etc.

Some extensions try to help with data entry, but none of them work properly and consistently enough to really help. Even consistently filling just username and pw is too much to ask.

It's my number 1 frustration when using the internet (worse than ads) and I find it mind-blowing that this hasn't been solved yet with or without LLMs.

I would pay a montly fee for any software that solves this once and for all and it sounds like it's coming (and I'm already paying their monthly fee).

> It's my number 1 frustration when using the internet (worse than ads) and I find it mind-blowing that this hasn't been solved yet with or without LLMs.

Simple: it's because not solving this problem is how our godawful industry makes most of its money. Empowering the user means relinquishing control over their "journey"[0]. Ergonomics means fewer opportunities to upsell or show ads.

I don't have the link handy, but I'm reminded of one of the earliest Windows user interface guidelines documents, back from Windows 95/98 era, which, in a section about theming/visual style, already recognized that they have to allow for full flexibility, because vendors will insist on fucking the experience up for the sake of branding anyway, and resisting it is futile[1].

--

[0] - I'm trying really hard to hold back my contempt towards terms like this, and the whole salesy way of viewing human-computer interactions.

[1] - They put it in much more polite terms, but the feeling of helplessness was already there.

>> because vendors will insist on fucking the experience up for the sake of branding anyway

I see that you too have at some point installed printer driver software.

Ted Nelson’s “intertwingularity” isn’t far off from the data entry problem described. He argues for universal data access where duplication is obsolete. Imagine form data as a single, linkable object across the web, editable in one place, reflected everywhere—no re-typing, just seamless auto-fill. That’s the unrealized potential of hypertext.
Yeah, my dream would be using this to scrape pages, pop the content into my provide db, serving it up in my own format (which is going to be a white page with letters with inline images and videos that are not ads. And my interactions fed back to the vision model to post in the original. So I never have to see a ‘design’ (heavy js riddled unreadable crap) again in my life. And so I can, with my own tooling, browse and reuse my history including content instead rely on all the broken stuff bolted on the web.
Bash pipes? The free flow of information through composable tools.

The commercial web? Not the above.

This is just a baseline. I’m sure that an LLM can help the issue but the biggest problem is that these varied HTTP-with-datastores are islands passing messages in bottles back and forth while a bash pipeline is akin to fiber optics.

consistently filling out username and password is all I wanted from my password manager, but it turns out it handles credit card number and other bits of information for me as well.
I've used Bitwarden to faster fill out job applications.
Doesn't chrome out of the box handle all of that?
use a password manager. i havent copy pasted form data twice on a site in a long time
FTL. See NiagraFiles.
The industry buzzword is "Robotic Process Automation", which as a category of products has been focused on using various forms of ML/AI to glue these things together in a common/structured way (in addition to good old fashioned screen scraping).

Up this this point, these products have been quite brittle. The recent explosion of AI tech seems like quite a boon for this space.

I totally agree on all points, especially around what AI means for this.

I'm kind of in a happy accident situation because I was working on something for RPA, which then became a layer that was factored as its own product, but now might be able to come full circle as a result of AI.

Essentially this layer can function as a "delivery medium" for RPA agent creation, that you can use on any device without download. However, as it has many others uses I've been working on those, but I've been seeking a great reason to get back into RPA.

I have a cool idea to leverage human-guided AI creation of data maps and action tours for RPA, but similar to what you say, unless great care is taken you can end up with a brittle approach. Also, as the market has been quite saturated many reasonable approaches, I just haven't felt compelled.

Yet now I think the possible merging of GPT level AIs with browser instrumentation to deliver an augmented way to browse the web makes that incredibly compelling.

So I'm incredibly thrilled that I have this happy accident of BrowserBox^0 (the factored out layer originally from RPA work above) which provides a pluggable/iframe-emebeddable interface for remotely controlling a headless browser. So now I want to look at unifying BrowserBox with this kind of GPT driven exploration.

It's even cooler, because, as BB enables co-browsing by default (multiplayer browsing) and turns the browser into a "client-server" architecture, I can see plugging in GPT-4V as a connecting client with some kind of minimal API affordance for it to use would, like the very cool vimium keyboard-enabled browsing in the OP, would be such interesting project to try!

We're open source so if you want to check us out or get involved in this quest, come say hi, maybe get involved if you're game!

0: https://github.com/BrowserBox/BrowserBox

I have watched your project for a while as a possible option for embedded browsers for XR applications like WebXR but the high licensing cost was a factor and solutions like Hyperbeam or Vueplex in Unity have been possible. Defiantly agree that multimodal LLM integration is a huge opportunity and multiplayer browsing with AI in realtime is a super cool idea if you package it right.
Hi jimmySixDOF thank you for the kind words and the attention on our project! :)

Regarding pricing we have heard that feedback over time and gradually adjusted our licensing costs. It should now be much more affordable as it is targeted towards large deployments, with decreasing cost and increasing value at scale.

If you'd like to send an email with any thoughts on our current prices on https://dosyago.com to cris@dosyago.com I'd highly value it!

Your idea of WebXR and embedding within Unity is very interesting, and I think it could be a fit.

In the OP's specific instance when would you reach out for a traditional ETL tool vs an RPA solution?
RPA is for data sources and destinations that are meant for human consumption and entry. So you’d use RPA to take an image of a table and enter every row into a web form.
How much does the involvement of a bank of fax machines complicate things?
A little perhaps, but not much. One can replace a bank of physical fax machines with modems.

It would be an interesting job for sure. Why wasn't it done before? I can imagine only two reasons. One, there isn't that much data to move and it makes no sense to build software for what few people spend 30min per day on. Two, the data in the legacy system is images and people are not just moving it between systems, but they also do categorisation, verification etc. In which case an AI model may be useful, but almost always hard coded rules will be faster.

Whenever I hear about such a thing (people doing legacy system data extraction manually) I wonder if in every case someone got the estimate for the "proper" solution and just decided a bunch of people typing is cheaper?

Integrating things like Chatgpt will still require people who know what they are doing to look at it, and I wouldn't be surprised if the first advice they give is "don't use chatgpt for it".

If the market forces work as they’re supposed to (not a given anymore), then corporations that adopt better tech will see better profits through lower expenses. And then the laggards will have to adapt or die.

Also remember that this is essentially v1 of the software- the Windows 95 of this adoption cycle

I remember years ago thinking it was weird in Ghost in the Shell when a robot had fingers on its fingers to type really fast. Maybe that really won’t happen since they can plug into USB at least, but they will probably use the screen and keyboard input sometimes at least.
Why would a keyboard be required? I think the intent to hit a letter would more easily be sent over a bluetooth HID "device". ;)
USB is an attack vector; if it's not exploiting your USB driver it's connecting your data pins to mains power. Keyboards are an air gap.
Isn't the keyboard connected to the computer via USB?

If I have access to the keyboard, I have access to a USB cable plugged into the computer, right?

Perhaps I misunderstand something....

I meant the reverse; the computer attacking the robot using it
Uhhhhh, thanks. That makes a lot of sense!
The issue with USB is you have to have power protection circuits. Analog interface at least in the show appeared much harder to hack.
I believe that LLMs will automate most of our data entry/copy/transformation work. 80% of the world's data is unstructured and scattered across formats like HTML, PDFs, or images that are hard to access and analyze. Multimodal models can now tap into that data without having to rely on complex OCR technologies or expensive tooling.

If you go to platforms like Upwork, there are thousands of VAs in low-cost labor countries that do nothing else than manual data entry work. IMO that's a complete waste of human capital and I've made it my personal mission to automate such tedious and un-creative data work with https://kadoa.com.

I was thinking what the payoff would be to pose as human for these terrible pay click jobs and then assign them to an LLM en masse. There's an arbitrage there ... it may be a good strategy.

I heard recently "click-work" works out to about $4/hr* If you could do that x50, passively, it's a fine income.

* - see https://journals.sagepub.com/doi/full/10.1177/14614448231183... or listen to https://kpfa.org/episode/against-the-grain-october-30-2023/ ... it's a fascinating study. Terrible pay (way below minimum wage) but surprisingly high worker satisfaction. The users seem to view it as entertainment essentially categorizing it as casual gaming.

The "asshole innovator" in me wonders if one could simply make it more entertaining and forego paying the user entirely.

Interesting. Instead of doing the click work manually, microworkers will just instruct and guide multiple GPTs.
maybe. A lot of modern clickwork is actually model training and there is a model-collapse phenomena (https://arxiv.org/abs/2305.17493) which means that it should be banned for such work. I bet a number of clever people on the platforms are already trying to instrument AI to do the work regardless - it's pretty close to "free money" if you can pull it off and not get caught and at a spigot size where there's no real serious consequences if you do.
Yeah this seems easy to build but would rather work on making tools that improve accessibility 10x
Yup, that's my long term goal. I want an "anything API" that brings structure to anything on the web.
Kinda sci-fi, we're so close to a future where when/if original source code is lost, a mainframe runs in an emulator and the human operating it is also emulated.
It's bizarre computationally, but at this point maybe we have to compare it to the alternative: hiring a person. At least the AI only consumes electricity (which is hopefully green), while a person consumes food (grown with mined fertilizers), or meat (which we know is really bad for the environment).
> a large contingent of people who essentially do manual data copying

Yup.

I was briefly part of a decades long effort to migrate off a main frame backend. It was basically a very expensive shared flat file database (eg FileMaker Pro). Used by thousands of applications, neither inventoried or managed. Surely a handful were critical for daily operations, but no one remembered which ones.

And the source data (quality) was filthy.

I suggested we pay some students to manually copy just the bits of data our spiffy "modern" apps needed.

No one was amused.

--

I also suggested we find a suitable COBOL runtime and just forklift the mainframe's "critical" infra into a virtual machine.

No one was amused.

Lastly, I suggested we throttle access to every unidentified mainframe client. Progressively making it slower over time. Surely we'd hear about anything critical breaking.

That suggestion flew like a lead zeppelin.

Working on this layer at https://autotab.com. This sounds like an amazing problem for browser automation to solve, would love to talk with you if you’re interested!
This type of use case is exactly why are building https://github.com/OpenAdaptAI/OpenAdapt
"Chinese Room Automation"
This has been fruitful ground for RPA offerings like UIPath and Automation Anywhere. Multi-model LLMs open up chance to disrupt them
Wow. Leaking confidential tax payer data.
I should have been clearer, it's between two apps that we host internally - applications on our own intranet cannot talk to each other. If you want to get any data out of either of these apps to the world, you need to do a manual export and email/usb which would obviously flag
Correct, but chat gpt reads screen data to be able to "click" around. So you would need to expose at least data that is displayed on screen to this external product.