if we forget about making the service for a moment, How would you store and process high dimensional vectors locally? What ready-made library/software to use? What datastructure?
Some storage options are:
- Store many vectors in a single HDF5 or LMDB file.
- Store single vectors in many small binary files (e.g. using numpy save() function).
To look up neighbors you might:
- Compute neighbors exhaustively (e.g. using Scipy distance functions).
- Use an Approximate Nearest Neighbors approach like one of the ones benchmarked here (https://github.com/erikbern/ann-benchmarks). These can be much faster than exhaustively computing neighbors, at the cost of some accuracy and having to periodically re-build an index.
To look up neighbors you might: - Compute neighbors exhaustively (e.g. using Scipy distance functions). - Use an Approximate Nearest Neighbors approach like one of the ones benchmarked here (https://github.com/erikbern/ann-benchmarks). These can be much faster than exhaustively computing neighbors, at the cost of some accuracy and having to periodically re-build an index.