| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HanClinto 42 days ago

Oh cool!

I've been doing similar experiments lately (using ViT's) to do card recognition, and so far it's been working really well for me. If you want to compare notes, I've open-sourced my code / weights [0] and written some blogs about how mine works [1]. I'd love to see if we can collaborate!

> Push the inference to the client-side (WebGPU / Web Workers).

I have an example of this working in webgpu / wasm here [2] along with a playground environment (demonstrated here [3]). I'm currently training a new version that uses a different ViT backbone more optimized for WASM inference -- it's currently converging, and I hope to have it finish training (or at least reach parity with the previous model) in about a week (took ~200 epochs for my last one to reach the level that it's at, and it takes about an hour per epoch in my current setup).

You mentioned WebGPU -- I've run into issues with the MobileViT-XXS backbone producing bad results in WebGPU on Android, so YMMV in whether or not WebGPU is stable enough to use for this or not. I don't know if it's my problem or a true bug in the platform, but I've fallen back to WASM and things are working much better since then.

[0] - https://github.com/HanClinto/CollectorVision

[1] - https://blog.hanclin.to/posts/gh-19/

[2] - https://hanclinto.github.io/CollectorVision/

[3] - https://youtu.be/MHieOcmC7Dw