| HN Mirror

It is quite possible B variant is not enough for some scenarios, earlier version also included the videos search, frames used for indexing were sometimes blur (not having fine-details) and these frames generally would have higher score for naive Natural language queries. I only tested with B variant.

But i resolved that problem upto a point by adding a Linear layer trained to discard such frames, and it was less costly than running a bigger variant for my use case.