Hacker News new | ask | show | jobs
by Zetobal 879 days ago
vit h and g are fine I wouldn't use b anymore.
2 comments

It is quite possible B variant is not enough for some scenarios, earlier version also included the videos search, frames used for indexing were sometimes blur (not having fine-details) and these frames generally would have higher score for naive Natural language queries. I only tested with B variant.

But i resolved that problem upto a point by adding a Linear layer trained to discard such frames, and it was less costly than running a bigger variant for my use case.

Can you give details as to why not?