I built Moments to get this game idea out of my head, finally. My original goal was to run on-device models specifically in mobile browsers, but running local vision models directly in phone browsers is still very much too early, so I focused on desktop.
How it works:
- You upload a photo.
- A local vision model running entirely in your browser captions it and picks a prominent object from the image.
This is probably a really clever coding exercise, but an enjoyable game it is not. Maybe share more about the code?