|
|
|
|
|
by munro
857 days ago
|
|
> 500ms is an acceptable latency for a search that runs offline without network calls. Definitely disagree, having fast search makes a huge difference. Apple Photo's search is fast. Especially since CLIP embeddings aren't perfect, you may need to try a few different keywords. > For 100,000 embeddings, writing took 6.2 seconds, reading 12.6 seconds and the disk space consumed dropped to 440 MB. Something sounds so wrong here, I read/write a 100-500 GiBs of data from Python, and it's much faster than this. In fact I'm seeing ~1.3s to write, and ~700ms to read from trivial Python: import pickle
import sqlite3
import numpy as np
IMAGES = 100_000
EMBEDDING_DIMENSIONS = 1024
### write
db = sqlite3.connect("images.db")
db.execute("CREATE TABLE IF NOT EXISTS image_embeddings (id INTEGER PRIMARY KEY, embedding BLOB)")
# generate synthetic embeddings
embeddings = np.random.normal(size=(IMAGES, EMBEDDING_DIMENSIONS)) # 1.61 s
# serialize embeddings for storage
serialized_embeddings = [(pickle.dumps(embedding),) for embedding in embeddings] # 626 ms
# insert into db
db.executemany("INSERT INTO image_embeddings (embedding) VALUES (?)", serialized_embeddings) # 679 ms
db.commit() # 95.3 ms
db.close()
#### read
db = sqlite3.connect("images.db")
serialized_embeddings = db.execute("SELECT embedding FROM image_embeddings").fetchall() # 370 ms
embeddings = [pickle.loads(x[0]) for x in serialized_embeddings] # 322 ms
|
|