Hacker News new | ask | show | jobs
by avatar042 1733 days ago
Interesting idea. As others have suggested it would really help if you have an accompanying video to support the claims.

A few thoughts/questions here:

1. What markets and use-cases were you thinking of when building out this MVP? The applications could be broad enough, but it seems like you expect CLIP to handle bespoke query results and hope that they return a result that is relevant. Also what might be interesting to test if you search for something that doesn't exist in the video, can you handle that well-enough (assuming it's just a simple threshold you're picking to identify relevant search results)? 2. Licensing is something that has always piqued my curiosity when it comes to ML-based apps. Do you have a sense of the commercial-use for models such as CLIP, especially when the datasets that they were probably trained on were not permitted for commercial-use? This also applies to the raw video data uploaded by the user.

1 comments

Some potential markets:

- home security

- searching through long home videos

- production companies with large video archives (this would require more tooling)

I am unsure whether to focus on one of these groups or to go for a more generic tool. I'll add a video demo to the landing page. So far, for all the tests I've performed the ML model can generalize well enough to cover this range of uses.

Licensing: I need to research this further. I'm not sure how the licensing changes due to the fact that I've also fine-tuned the model on my own data.

Thanks for the info on markets. What made you consider fine-tuning further on your own data? Was CLIP not sufficiently good enough to test the market?

FWIW I recall having seen something similar with Google Cloud's Video Intelligence API (https://towardsdatascience.com/building-an-ai-powered-search...). Building something generic would make it especially hard to get right, especially if your users want high precision-recall from their search results.

Re: licensing, the world of startups is somewhat of a wild-west these days with folks offering pre-trained models as-a-service without really thinking about the licensing implications (both on the dataset and model front). Huggingface is a classic example, and they seem to suggest that it's perfectly OK to fine-tune and use commercially (https://github.com/huggingface/transformers/issues/3357#issu...), but I'm not certain that their lawyers would put it the same way.

Pre-trained CLIP gets you 95% of the way there, so you're correct, fine-tuning isn't necessary to test the market. The one downfall of pre-trained CLIP is that it hasn't been trained on still images from videos. These have a different noise characteristic and contain considerably more motion blur than your average image used for training.