Hacker News new | ask | show | jobs
by arciini 1337 days ago
I'm sad that this is happening to an app that's useful to its users, but the reality is that scraping is legal, always possible, but difficult.

This particular case is a bit harder since it's not purely using public data, but may still qualify since it's likely scraping with legally-obtained credentials.

I know of businesses (scraping for ride-sharing, scraping for business intelligence for retailers, scraping from LinkedIn - see HiQ Labs v. LinkedIn) that have continuously succeeded via scraping in ways that large businesses oppose.

The key is: you must make enough profit to justify dedicating engineering and legal techniques to defend your scraping.

- Scraping public data is legal, as affirmed by the Supreme Court in Van Buren v. United States [1] and HiQ Labs v. LinkedIn [2]. Defending yourself or suing the data owner in court are both expensive though

- Defeating anti-scraping via technical means is pretty much always possible, but can be costly depending on the scraped site's technical expertise and value in keeping their data private. The benefit to you must exceed the cost to you, and ideally should also exceed the cost to the data owner

- Mobilizing PR and internal resistance may also be effective, but it's usually hard to have outcry from a large enough group to change an organization's policies. In this case, the union can push for it, but AA may try to withhold improvements until the next set of union negotiations

1. https://en.wikipedia.org/wiki/Van_Buren_v._United_States

2. https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

5 comments

> the reality is that scraping is legal, always possible, but difficult.

> This particular case is a bit harder since it's not purely using public data, but may still qualify since it's likely scraping with legally-obtained credentials.

No, it's easy: they're employees, they can be told they're not allowed to do that. Doesn't matter if the app's legally allowed to exist or not.

They can always tell their employees they're not allowed to do something, and punish the ones that do. I think it's an important distinction though that whatever they tell their employees, the app isn't doing anything wrong in a legal sense. So still legal, possible, and difficult.
And I don't think there's anything illegal about the employer making it more difficult; that may well be the cheapest/easiest way of stopping employees using it.
There's nothing illegal about it, but that doesn't mean that making life harder for your employees trying to make sure the hours they're working are legal is moral.

EDIT: I welcome anyone who wants to justify this ethically.

Agreed. I haven't commented on morals, ethics, or even witb judgement on what ought to be legal or not.
I'm not sure I interpret the article or the parent comment as saying American Airlines isn't allowed to do this. They're just making life harder for their employees and dont seem to be addressing the problem they're working around. It's just a little whistleblowing that they're a shitty employer.

I think what the parent comment is trying to say is that their description of their approach here as "sophisticated bot detection" is a little bit like someone calling me a hacker because I have my terminal open during the flight. There is an intentional use of words here trying to make the app developer sound like the bad guy.

This is the company that is also currently sueing ThePointsGuy over an app that helps you manage your AAdvantage (loyalty) points.

Suffice it to say, American Airlines IT are apparently a bunch of dicks.

Helps you maximize your AAdvantage points, which isn't entirely in AA's interest.
While I see the logic in that a lot more, logical != Ethical.
You know, employees are not actually property, and there are actually limits on what an employer can tell an employee.

Everything an employer might possibly try to say about using any other software or tools to collect, handle, and redisplay "their" data, applies exactly the same to a blind employees screen reader.

Hell it applies to glasses.

Thank deity for blind people and other disabilities making it actually illegal to be as huge dicks as some companies would be if they could be.

I do not understand the the desire to even try to defend AA's position here, but am glad it's a failed attempt at least.

I'm not sure why it reads as 'defending [the employer]'s position', but that's not my position, I don't care at all. (I'm not American, I may very well never have anything to do with the airline even as a customer.)

If you are its employee, jolly good luck to you with your 'well if I were blind what I've been provided with while not blind would not be adequate and I might need to use a different tool to this which works similarly' argument.

"I'm not sure why it reads as 'defending [the employer]'s position'"

Saying that the employer has the right to dictate those terms is literally and explicitly doing nothing else but defending their position that they have the right to dictate those terms.

You’re making the classic mistake of confusing explanation for defense. It happens all the time with the Ukraine crisis as well.
I grant that's possible.
They can 'dictate those terms' - doesn't mean I think it's good! (Doesn't mean I don't either, I haven't commented on it!)
But they also deserve to have access to their work schedules, and I bet a good lawyer could argue that "access" should be interpreted broadly here.
Presumably the non-public data is being scraped using the employees' credentials (i.e. username and password).

It is perfectly reasonable for an employer to have a policy which states, "do not give your work username and password to a third party." I can't imagine a court ordering otherwise.

Providing an API for this data is a non-trivial amount of work, involving significant technical and compliance challenges. Employee schedules would be useful as a signal for trading in AA stock. How do you enforce that the third party is properly protecting that information, e.g. during SEC-mandated blackout periods around earnings?

The union might be able to negotiate for AA to hire lawyers and IT staff to work on such an API, but I really can't see the employees being automatically entitled to it.

If the scraping is happening "on-device", though, then they're not providing their details to a third party. They're simply accessing their schedules. Otherwise, pulling up their schedules in any web browser would be considered giving their credentials to a third party since that's basically what's happening here. It would be like logging in to the aa.com employee site and then installing a Chrome extension that reads the page that was downloaded. Nothing is given to the Chrome extension in terms of credentials, only page content.
It might be possible to build this app in a way that none of the information ever leaves the device. I would be very surprised if that was the case here.

Most large IT departments have a list of approved browsers and browser extensions. The scenario you described would fall under the same policy. If Chrome uploaded the content of intranet web pages to Google, I expect it would be banned as well.

>It might be possible to build this app

Not only is it possible to build it this way but I think it's far more likely that it already is built this way. Since the app is pulling up schedules for individual users, there's no benefit to scraping the info on a server or caching any of it as it would be unique for each user. There's no reason for that info to leave the device. The content is pulled, formatted, and then displayed in a style that matches the rest of the app. This can easily be done on-device and would be less efficient to do off-device.

>Most large IT departments have a list of approved browsers and browser extensions.

This is completely irrelevant considering this is being done on mobile devices. On iOS, at least, it's all webkit and done within the app itself. I was just using Chrome as an example for how this process is done without sending the credentials to a third party. Unless the company wants to ban people checking their own schedules, there's no way they can stop someone from logging in to a web browser and having the content scraped. As an example, let's say they only allowed Microsoft Edge as the "approved" browser and they didn't allow any Edge extensions to be installed. The user can still pull up the page in Edge, save the content once it's loaded, and feed the folder/HTML file to the app to scrape the content. There's literally no way for them to prevent this other than by severely obfuscating the content (e.g., randomly adding invisible characters into strings to prevent string searches or adding bogus HTML elements to prevent searches for element patterns) or ceasing access to it completely.

>It might be possible to build this app in a way that none of the information ever leaves the device. I would be very surprised if that was the case here.

I'm doing some similar stuff to automate an app for personal use, I might at some point turn it into a paid for app, when I do so I would actually have to redesign the application to send personal information off the device. Which I suppose I would not do.

AA Flight attendants are in a union working under a collective bargaining agreement. The employer can’t just change this unilaterally.
Mere threat to strike for a day would make AA to buy out developer for millions of $.
Airline workers are covered under the Railway Labor Act in the US. They can’t just strike because they’re unhappy. There’s a long drawn out process before a strike can happen. See recent threats of railway workers striking in the US.
>No, it's easy: they're employees, they can be told they're not allowed to do that. Doesn't matter if the app's legally allowed to exist or not.

They're unionized employees. Someone running a company looking to make their life harder for no reason needs to think five times before they start making arbitrary and baseless demands for changes in policy. It could end up costing you tens of millions of dollars because you forgot that employees are still people and your demands will be met with demands in return.

> The key is: you must make enough profit to justify dedicating engineering and legal techniques to defend your scraping.

It also works if you have philanthropic, non profit, or unconventional backing to pay for these defensive resources. If this app is providing substantial benefits to the AA crew around scheduling and QoL, their union might consider providing some backstop/support.

https://www.apfa.org/

If memory serves, FA unions often use seniority-oriented contracts. The more senior members will tend to be more active and better-represented among union leadership. Reserve members are often more junior.

Putting on my cynical prick hat for a moment, I would guess the union as an institution is far more willing to throw the app-oriented concerns of the junior members under the bus than the health care and pension concerns of the senior ones.

> The key is: you must make enough profit to justify dedicating engineering and legal techniques to defend your scraping.

That's why web scraping is a huge SaaS market these days (I'm part of one too @ scrapfly.io).

Loads of our customers are tiny businesses and entrepreneurs that could no way afford the engineering effort required to scrape any of these websites and honestly empowering small folk against these giant, untouchable corporations is the best part about my job :)

It's noteworthy that American Airlines has taken the hardest line against blocking AwardWallet, too [1].

https://yourmileagemayvary.net/2021/12/21/is-this-the-reason...

Playing devil’s advocate here: this is not the same as scraping LinkedIn data. Linkedin data is public. This app requires a login info from a flight attendants to scrape their schedules. When you try to log in, you can choose to login as public or as a AA flight attendant. It sucks, but I also understand why a company may be unhappy, that a third party handles credentials and accesses internal data. What they can:

- build a 3rd part integration API, which opens up a whole can of worms. Not many tech-first companies can do it right, for an airline it’s a very challenging steps.

- build their own, but they already failed there if their employees turn to 3rd party

- ignore and let it run. This is basically unauthorized access to go and hope that the guy names Jeff won’t screw up.

- deny and prevent access. This is probably technically the easiest and safest from legal standpoint.

> This app requires a login info from a flight attendants to scrape their schedules.

So? If the flight attendants have provided their credentials to the scraping software, they have essentially authorized the software to scrape the data on their accounts. It's just a custom user agent running locally and the airline company has no business blocking anything.

In other words: "you can write this down by hand, copy paste or browser plugins but you cannot automate this". I wonder if this stood up in any other context and I can't imagine of a similar scenario from the top of my head where automation would be forbidden. I could totally hire a part time student from a developing country to do data entry for me and that would be alright? Strange world - somehow these corporations have people brainwashed.
The issue is not that some app has access to a timetable of work shifts. It is that it has access to credentials and potentially can so something else. In your analogy a part time student from a developing country data entry - this is scrapping public linkedin data. What happens here is an employee giving their office badge, so they can go get a folder from the employee’s desk, open it and make a presentation based on its content. To make it worse, many employees give their badges to the exactly same student.
Third party having an unrestricted access to the internal system. No sane business owner would be ok with it. This is literally the reason why protocols like oauth2 exists.