much more mature and feature rich then many of the competition listed in the article
to some degree it's more a platform you can use to efficiently and flexible build your own more complicated search system, which is both a benefit and drawback
some good parts:
- very flexible text search (bm25), more so then elastic search (or at least easier to user/better documented when it comes to advanced features)
- fast flexible enough vector search, with good filtering capabilities
- build in support for defining more complicated search piplines, including multi phase search (also known as rerankin)
- quite nice approach for more fine controlling about what kind of indices are build for which fields
- when doing schema changes has safety checks to make sure you don't accidentally brake anything, which you can override if you are sure you want that
- ton of control in a cluster about where which search system resources get allocated (e.g. which schemas get stored on which storage clusters, which cluster nodes should act as storage nodes, which should e.g. only do preprocessing or post processing steps in a search piplines and which e.g. should be used for calculating embeddings using some LLM or similar) Not something you for demos but definitly something you need once you customers have enough data.
- child documents, and document references
- multiple vectors per document
- quite a interesting set of data types for fields and related ways you can use them in a search pipline
- an flexible reasonable easy to use system for plugins/extensions (through Java only)
- support building search piplines which have sub-searches in extern potentially non vespa systems
- really well documented
Through the main benefit *and drawback* is that it's not just a vector database, but a full fledged search system platform.
generally if you have multiple embeddings for the same document you have two choices:
- create one document for each embedding and make sure non membedding specific attributes are the same across all of this document clones -- vespa makes this more convenient by having child documents
- have a field with multiple documents, i.e. there are multipel vectors in the HNSW-index which point to the same document -- vespa support this, too. It's what I meant.
vespa is currently the only vector search enabled search system which supports both in a convenient way, but then there are so many "vector databases" poping up every month that I might have missed some