| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nl 1203 days ago

This looks interesting for image retrieval.

I don't love the way their tables[1] report performance though. My understanding is that the "Dataset" column in the table represents the size of the training dataset, not the size of the dataset they are evaluating on. Note that this undersells their performance though, so it isn't like they are trying to hide something here!

Also I'd love to see someone do a similar benchmark for the OpenAI CPT-3 embeddings. I'm pretty unclear how well they compare to something like FLAN-T5, because they don't seem to be evaluated anywhere in the retrieval setting (unless I've missed it?)

[1] See "Zero-Shot Image Retrieval, English-only" in https://www.unum.cloud/blog/2023-02-20-efficient-multimodali...

1 comments

vov_or 1202 days ago

Hi! MSCOCO and Flickr datasets are the main datasets for Image retrieval. The results published in most papers (including CLIP) are based on them. So we used exactly these datasets for evaluation.

link