Hacker News new | ask | show | jobs
by ajcp 1555 days ago
"Most native RPA automation is GUI based. But let us take a moment so that this sinks in. GUI based automation involves instructing a bot to communicate with another program via the UI. This is analogous to forcing two native speakers to communicate via charades. GUI based automation is always a compromise because there is invariably a more efficient way to perform the same task under the hood. This brings me to my second reason."

The author isn't very clear here and seems to be themselves unclear on how these RPA technologies actually "see" an application.

Every Robotic Operating Model I've ever seen or worked on has always held firm that "surface automation" (think Citrix Receivers and applications built in Silverlight or Flash) should be outside the scope of any RPA solution.

What's left are browser or desktop based "physical" applications that actually have an underlying model used to describe and render a GUI. This is actually what is being utilized by (most) RPA clients.

While correct that the clients do interact with the UI of a program, depending on the application they actually interpret (or "see" it) via the application COM (component object model), or in the case of a webpage, the DOM (document object model). Since these are essentially used to describe what is rendered on screen -usually in a more detailed way then what is actually rendered on screen- the RPA client is able to efficiently and confidently interact with the application.

Take a webpage with a red button to click for example:

Automating purely via UI/surface automation: - Capture 150px by 150px at screen coordinates x and y - Is captured image red with the words "Click me" on it? - Go to x coordinates on screen - Go to y coordinates on screen - Send key 'Right mouse button click" - Pray

Automating via DOM: - Attach to process `Chrome titled "My webpage"` - Is element `<button enabled=true id="superUnique_superDependable" class="clickMe" onClick="navigate(MyOtherWebpage)">Click me</button>` present? - (to browser client) Send key `Right mouse button click` to element id = "superUnique_superDependable" - Wait for process `Chrome titled "My other webpage"`

OR

Automating via DOM: - Attach to process `Chrome titled "My webpage"` - Run function `navigate(MyOtherWebpage)` - Wait for process `Chrome titled "My other webpage"`

1 comments

Author here. I agree entirely with what you say here. But consider this. I once automated an RPA process to create service records for the customer service department. I did this by automating over a Java based UI app, with reliable selectors as you correctly point out.

However, that app's UI would keep changing due to updates from the IT department, over which the customer service department had no control over. This is why I call the automation 'fragile'.

But if you are automating using a bot (a software program), why does it not have access to the same pool of data as the UI application (some database somewhere)? Would that not be a more robust means to retrieve it rather than via the UI?

If you are automating a business process via the UI (other than for simulating user interactions for testing) there invariably exists a more efficient way to achieve that end without involving the UI. I have found no exception to this rule.

True, there is (hopefully) always something like a DB on the other end, containing the data in a a fashion that makes it much easier to operate on (sql).

But is that really RPA at that point ? Suppose your task is to scratch some data from a website, calculate something and if some condition is true also perform some form submit.

If you do have access to the actual database you can just use that. If you also have access to the API of the system, you have to "check some box", you might as well just write a normal application that calls some SQL, perform some http-posts and maybe also provides a nice website with that data it crawled, formatted in the way you wanted to calculate the data, in, let's say excel. (Some would say you wrote a microservice.) And if all of this is possible then you're apparently good enough in programming that such a solution is the way to go.

But if you're in the typical corporate environment where the API doesn't exist / you can't access the database / you're not allowed to interact with the checkbox except for the proprietary application, then you're back to what an RPA was supposed to do.

Because otherwise we're down to one question: When is it an RPA ? When you're just using a fancy UI ? When you're not accessing any API ? When you don't script ? Or is the example above with SQL + API also an RPA ?

RPA to me rings "mouse recorder for dummies" or "ugly hack that breaks depending on how good your visual detection system is". At least when you described discovering that you can actually just use real scripts instead of UI indirection, it reads like something like that. A "not invented here syndrome" of that RPA tool, instead of a helper that handles checkbox-detection + clicking for you.

Well, the truth is that developers often underestimate how hard it is for non-developers to write scripts. The reason why they don't understand why something like RPA exists is also known as 'The curse of knowledge'.
I think you're totally right. I could write a lot about this divergence between knowing how the technology works and trying to get something done with the limited tools or time you have. But essentially you're right.

Also that auto-hotkey script for automatic driving in GT is really impressive. ( https://github.com/ByPrinciple/GT7-Scripts/blob/main/PanAmer... ) So that's something between scripting and RPA. Essentially a DSL for exactly this task, which totally destroys my point. And does make a case for learning how to script with a tools that's capable enough, that it is useful for more than the best case.