Hacker News new | ask | show | jobs
by tlack 1365 days ago
You can feed in any combination of image or text and get an equivalent array of 768 floats. By comparing two of these arrays to each other you can determine semantic similarity between two images, text and an image, text and other piece of text.. Quite useful. But somewhat limited in representational capacity.