| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lovelearning 2999 days ago

[Update: Oh boy, I realized this is a gigantically long post after posting it. Sorry about that, but I hope my step by step explanation convinces you.]

Not conflating - the benefits apply equally to both learning and inference, but since there are magnitudes more potential consumers of inference than learning, I emphasized it.

It's true that one doesn't need any of this, but my point is not having them in browser means there are barriers put up - of complexity, of costs, of privacy, of effort - to developers and end users.

I'll use face recognition as a walkthrough example, but this applies to absolutely any ML use case if you think about all the steps involved in taking it from idea to development to deployment to end use.

Take a problem I've worked on a bit - intelligently searching through personal photos and videos. Most people have atleast a few hundreds of GBs of photos and videos - family photos, pets, travels - in aggregate across all their devices. Some may feel the need for search software that can answer questions like "find me that photo with Alice (user's daughter) playing with Scooby(user's dog) from 10 years ago".

In a world without browser ML, how would a developer design, develop and deploy this with maximum convenience for both development and end use? Maybe like this...

- dev starts off by deciding they don't want to mess with any of that ML stuff. Their skills lie in front-end design and usability. They decide to go with Amazon's or Google's face recognition service (IDK if AMZ/GOO actually have such a service, but if they did, it's reasonable a dev would look at them as the first option).

- But they soon find out it's just shifting the complexity elsewhere. Now, they have to provide a way for users to upload their hundreds of GBs of media to S3 or GCS. Which means more APIs to learn and integrate. More costs for storage. Usability barriers and privacy suspicions for users. Security aspects have to be looked into. Looks like it'll have to become a paid service now.

- The service by itself is not enough. Dev still has to provide the front-end (which they are skilled at) for users to select photos, crop faces, apply labels, and send it all to the service's transfer learning API.

- After all that, some users complain that accuracy is not good enough because it couldn't find many photos. Dev has no way to tweak the models because those are behind another company's opaque service. It's increasingly looking like a custom backend is necessary.

- So version 2. Dev learns some ML. Then downloads a pre-trained model that can do face detection and recognition - say FaceNet or OpenFace.

- They have to deploy it server-side for training and indexing. They learn a bit of Nginx and WSGI, and deploy it. They don't know how many users will use and how much data will be uploaded - have to plan automated scaling for that. EC2 or GCE? More stuff and more APIs to learn, and more costs.

- Dev still has to provide the front-end for users to select photos, crop faces, apply labels, and upload to their learning service. Dev has to implement per-user transfer learning and store per-user transfer data and models.

- Dev has to implement all the required provisioning for inference and transfer learning - be it raw GPU servers or docker or K8s or whatever. More costs.

- For an end user, the need to upload hundreds of GBs of personal media to a 3rd party is also a barrier - takes time, loses privacy and likely incurs bandwidth costs.

- So version 3. Dev says forget the server-side. User already has GBs of photos in their hard disks. Instead of bringing their photos to us and managing it, let's take the software to them. Let's just package up everything and allow user to download and use the entire thing on their local machines. Maybe as platform-specific installables. Or as platform-neutral docker image. Reduces costs and complexity for developer. Can even be free since there are no costs incurred by developer. Android's still a problem since it can't do docker, and dev doesn't know Android app development.

- The end user too benefits with far better privacy and usability. However, they still have to install a package - sounds easy, but in a world of "user does not have administrative privileges" and "sudo", there are still potential barriers to cross. And Android is still a no-go because the dev doesn't know it.

Now in a world with browser ML, you can see how those remaining problems too can be solved. Javascript ML is write once, run on any browser - even Android's. User does not have to install anything. Dev does not have to write anything specifically for a different platform. All the transfer learning and inference can happen in user's browser.

The browser environment still presents some barriers - such as not being able to access local photos directly without user selecting them, and limited local storage for models. But both can be solved with some creative batching and using solutions like emscripten's virtual file system in memory (I'm not sure if TF.js uses the latter, but other frameworks like OpenCV.js do). User pays some cost of reduced usability, which they may be ok with since they may see the alternative options as being worse. And the privacy is matchless.

All this is applicable to any ML use case. Anything involving user's private data such as speech recognition or document scanning/OCR too get the exact same benefits for both developers and users.

1 comments

bo1024 2995 days ago

Sorry for the late response, but I want to thank you for the in-depth post! I agree with you that version 3 is way better, but I'm very cautious about advocating for browsers as de-facto operating systems. If we want better cross-platform systems for sandboxing and running programs, I'd prefer to develop those directly instead of giving more power to browsers and browser vendors.

link