| "Most native RPA automation is GUI based. But let us take a moment so that this sinks in. GUI based automation involves instructing a bot to communicate with another program via the UI. This is analogous to forcing two native speakers to communicate via charades. GUI based automation is always a compromise because there is invariably a more efficient way to perform the same task under the hood. This brings me to my second reason." The author isn't very clear here and seems to be themselves unclear on how these RPA technologies actually "see" an application. Every Robotic Operating Model I've ever seen or worked on has always held firm that "surface automation" (think Citrix Receivers and applications built in Silverlight or Flash) should be outside the scope of any RPA solution. What's left are browser or desktop based "physical" applications that actually have an underlying model used to describe and render a GUI. This is actually what is being utilized by (most) RPA clients. While correct that the clients do interact with the UI of a program, depending on the application they actually interpret (or "see" it) via the application COM (component object model), or in the case of a webpage, the DOM (document object model). Since these are essentially used to describe what is rendered on screen -usually in a more detailed way then what is actually rendered on screen- the RPA client is able to efficiently and confidently interact with the application. Take a webpage with a red button to click for example: Automating purely via UI/surface automation:
- Capture 150px by 150px at screen coordinates x and y
- Is captured image red with the words "Click me" on it?
- Go to x coordinates on screen
- Go to y coordinates on screen
- Send key 'Right mouse button click"
- Pray Automating via DOM:
- Attach to process `Chrome titled "My webpage"`
- Is element `<button enabled=true id="superUnique_superDependable" class="clickMe" onClick="navigate(MyOtherWebpage)">Click me</button>` present?
- (to browser client) Send key `Right mouse button click` to element id = "superUnique_superDependable"
- Wait for process `Chrome titled "My other webpage"` OR Automating via DOM:
- Attach to process `Chrome titled "My webpage"`
- Run function `navigate(MyOtherWebpage)`
- Wait for process `Chrome titled "My other webpage"` |
However, that app's UI would keep changing due to updates from the IT department, over which the customer service department had no control over. This is why I call the automation 'fragile'.
But if you are automating using a bot (a software program), why does it not have access to the same pool of data as the UI application (some database somewhere)? Would that not be a more robust means to retrieve it rather than via the UI?
If you are automating a business process via the UI (other than for simulating user interactions for testing) there invariably exists a more efficient way to achieve that end without involving the UI. I have found no exception to this rule.