| HN Mirror

as mentioned in another comment, "scale" is not just horizontal, it's vertical as well. with millions of products (UPCs) across different visual tolerances it's hard to generalize. your annotation method is indeed more efficient than a multistep "go take a bunch of pictures and upload them to our severs for annotators" but is still costly in terms of stakeholder buy-in, R&D, hardware costs, and indeed labor. if you can scope your verticals such that you only have, say, 1000 products the problem become feasible, but once you start to scale to an actual grocery store or bodega with ever-shifting visual data requirements the problem doesn't scale well. add in the detail that every store moves merchandise at different rates or has localized merchandise then the problem becomes even more complex.

the simulated data also becomes an issue of cost. we have to produce a realistic (at least according to the model) digital twin that doesn't interfere too much with real data, and measuring that difference is important when you're measuring the difference between Lay's and Lay's Low Sodium.

i'm not saying it's unsolvable. it's just a difficult problem