Hacker News new | ask | show | jobs
by simonw 753 days ago
First you need to understand embeddings, and CLIP. I have a detailed guide here that should help you with that: https://simonwillison.net/2023/Oct/23/embeddings/

Then you need to understand binarization. This is a surprisingly effective trick that observes that if you have an embedding vector of, say, 1000 numbers those numbers for many models will be very small floating point numbers that are just above or below zero.

It turns out you can turn those thousand floating point numbers into one thousand single bits where each bit simply represents if the value is above or below zero... and the embedding magic mostly still works!

And instead of the usual cosine distance you can use a much faster hamming distance function to compare two vectors instead.

Once you understand embedding vectors and CLIP that should hopefully make sense.