Hacker News new | ask | show | jobs
by axg11 1742 days ago
Some potential markets:

- home security

- searching through long home videos

- production companies with large video archives (this would require more tooling)

I am unsure whether to focus on one of these groups or to go for a more generic tool. I'll add a video demo to the landing page. So far, for all the tests I've performed the ML model can generalize well enough to cover this range of uses.

Licensing: I need to research this further. I'm not sure how the licensing changes due to the fact that I've also fine-tuned the model on my own data.

1 comments

Thanks for the info on markets. What made you consider fine-tuning further on your own data? Was CLIP not sufficiently good enough to test the market?

FWIW I recall having seen something similar with Google Cloud's Video Intelligence API (https://towardsdatascience.com/building-an-ai-powered-search...). Building something generic would make it especially hard to get right, especially if your users want high precision-recall from their search results.

Re: licensing, the world of startups is somewhat of a wild-west these days with folks offering pre-trained models as-a-service without really thinking about the licensing implications (both on the dataset and model front). Huggingface is a classic example, and they seem to suggest that it's perfectly OK to fine-tune and use commercially (https://github.com/huggingface/transformers/issues/3357#issu...), but I'm not certain that their lawyers would put it the same way.

Pre-trained CLIP gets you 95% of the way there, so you're correct, fine-tuning isn't necessary to test the market. The one downfall of pre-trained CLIP is that it hasn't been trained on still images from videos. These have a different noise characteristic and contain considerably more motion blur than your average image used for training.