Hacker News new | ask | show | jobs
by marvin-hansen 19 days ago
I actually had the almost same situation by building an offline voice dictation app for macOS and iOS, and in macOS I was confronted with the exact same situation.

However, I would like to point out that Apple isn't totally wrong here because the accessibility API unfortunately is way too broadly scoped, and because of that you literally get access to everything on the computer like you you can screenshot listen and and move the cursor... This is completely ridiculous and the proper engineering solution would actually be to phase out the accessibility API and replace it with something that is narrowly scoped so you can grant specific permissions individually.

However, Apple, being Apple, is obviously not doing anything, and instead says no accessibility permission for anything that isn't demonstrable accessible. Now, there are obviously some exceptions because Apple is not particularly well known for applying its rule consistently and granting big exceptions for itself. However, they do have a valid point on privacy and data protection. And I say that as somebody who ended up distributing my MacOS app outside the App Store because I only got approval for iOS.

That said, I would definitely appreciate if Apple would gradually improve its developer program experience, because compared to its hardware lineup, the developer program is nothing short of abysmal.

5 comments

> However, I would like to point out that Apple isn't totally wrong here because the accessibility API unfortunately is way too broadly scoped, and because of that you literally get access to everything on the computer like you you can screenshot listen and and move the cursor...

I want apps to be able to do that!

Yes but miffing to open Privacy & Security & see dozens of apps pretending to need “accessibility” features. Apple has a dozen+ categories there but many poweruser apps I want specifically need accessibility.

Is there an opinionated reason not to break out capabilities?

> Is there an opinionated reason not to break out capabilities?

If you have a disability and need tools to use your computer the last thing you want to do is have those things not only off by default but complicated and involved to turn on.

Is there a reason a capability has to be covered by only a single permission? Why not have one accessibility permission that covers all that and then a bunch of individual permissions for non-accessibility apps?
Apple doesn’t provide another API for this, so apps have to use the one that’s available.
i think the issue is that you can still have these all under the accessibility api but why not break that down more

accessibility.screenshot accessibility.paste

and whatever else there is. that completely removes the issues for apps like this.

As a programming practice in service of the principle of least privilege, that would make complete sense.

The issue is with Apple's UX. Apple insists on asking permission for every little capability an app wants. So I would have to say "yes, allow this app to take screenshots" and "yes, allow this app to read the clipboard".

I wouldn't be surprised if, in the near future, Apple forced people to click "yes, allow this app to read the clipboard from app X" and then separately "yes, allow this app to read the clipboard for app Y" and so on for every single other app on my machine.

Apple does not allow you to say, "yes, I trust this #$@-ing app, please allow it to do whatever it needs."

annoying true, but no reason they couldn't group read/write into the same prompt.
Then they should use an appropriately scoped API, as OP suggested.
Controlling my computer is appropriate scope for an accessibility tool
Isn’t that just deliberate on their part? As in, they genuinely don’t want developers to use these APIs and just allow them for accessibility use cases.
If that were the case, and Apple suddenly decided that no apps are allowed to use the accessibility APIs, so many utilities would just cease to exist, it'd ruin the OS tbh.

You'd lose all window managers, things like alfred and textexpander, screenshot tools, computer use agents, etc.

Gradually improve? How many more decades is reasonable to wait? They are what they are and hoping for change makes no sense to me.
Thanks for sharing this. The "phase out the broadly-scoped Accessibility API and replace with narrower permissions" point is exactly the right structural fix. Right now developers have to declare a permission far broader than they actually need, and from the outside the criteria for what counts as legitimate use isn't clearly defined. Interesting that your iOS app got through but macOS didn't. WhisperPad is Mac-only and I haven't gone through the iOS path, so your experience there is useful data. The "demonstrable accessibility" criterion seems to be where everything bottlenecks.
> However, I would like to point out that Apple isn't totally wrong here because the accessibility API unfortunately is way too broadly scoped, and because of that you literally get access to everything on the computer like you you can screenshot listen and and move the cursor... This is completely ridiculous and the proper engineering solution would actually be to phase out the accessibility API and replace it with something that is narrowly scoped so you can grant specific permissions individually

If you don't have use of your hands you want that. The whole point of accessibility APIs is allowing arbitrary control of your computer via novel means. One of the big selling points of Dragon Natually Speaking is the ability to tell your computer to do things based on descriptions without a mouse. "open outlook", "click compose", "select subject", "type foo", etc.

And no the solution here is not computer vision with an LLM. Text and buttons rendered on my computer exist in memory somewhere as text and buttons. We should not need to convert them to pixels and back lossily to recover text and buttons. We should just expose things to the accessibility API and not guess.

> And no the solution here is not computer vision with an LLM.

Also, even if you hypothetically wanted to use computer vision with an LLM… what API is that LLM going to use to take screenshots and click on stuff?

> Chrome and anything electron based don't provide any accessibility information to the OS

Are we sure about this? At least on windows, NVDA works fine with chrome and any electron apps.

Looks like they fixed this one since I last checked in 2016