| We’ve put out a ton of demos that use much smaller models (10-60 MB), including: - (44MB) In-browser background removal: https://huggingface.co/spaces/Xenova/remove-background-web. (We also put out a WebGPU version: https://huggingface.co/spaces/Xenova/remove-background-webgp...). - (51MB) Whisper Web for automatic speech recognition: https://huggingface.co/spaces/Xenova/whisper-web (just select the quantized version in settings). - (28MB) Depth Anything Web for monocular depth estimation: https://huggingface.co/spaces/Xenova/depth-anything-web - (14MB) Segment Anything Web for image segmentation: https://huggingface.co/spaces/Xenova/segment-anything-web - (20MB) Doodle Dash, an ML-powered sketch detection game: https://huggingface.co/spaces/Xenova/doodle-dash … and many many more! Check out the Transformers.js demos collection for some others: https://huggingface.co/collections/Xenova/transformersjs-dem.... Models are cached on a per-domain basis (using the Web Cache API), meaning you don’t need to re-download the model on every page load. If you would like to persist the model across domains, you can create browser extensions with the library! :) As for your last point, there are efforts underway, but nothing I can speak about yet! |
I'm keen to do more stuff with WebGPU, so very interested to learn about challenges and limitations here.