Hacker News new | ask | show | jobs
by jjackson5324 875 days ago
It's a series A. Thinking about the moat in a hot market like vector DBs is a great way to miss out on unicorns.
1 comments

the OP argued that it is not a hot market - companies like openai is going to eventually use its own while small players are going to just use openai's assistant APIs, they don't have to operate their own "vector database".

it is also worth to mention that even if there is going to be a market called "vector databases", which is highly unlikely, you can't just written off all existing regular databases and pretend that they are not going to just walk in and take over.

all in all, there is no reason to believe it is a hot market. it is much better to ask is there going to be a market at all.

For a comparison of existing assistants API vs. vector search, you can check out my blog at https://nostrebored.com

At a high level there are a few differences

- Control over embeddings. What gets embedded? What are the output vectors? What models do you use? How do you handle multimodal input?

- Performance. When you make a call to Assistants, you have to wait for the Assistant to understand that it needs to do RAG. This performance hit is actually quite large (look at the two videos on the blog for reference)

- Cost. OpenAI has an incentive to load the context window to consume more tokens. A few dozen calls to Assistants was costing me around $10.

> For a comparison of existing assistants API vs. vector search

Sorry, but I am not going to read it as it is not an apple to apple comparison atm. OpenAI just released its assistants APIs literally just weeks ago, when so called vector databases have been burning money for ages. You can write a thesis on how those vendors are doing slightly better for now, that won't be the big picture showing the reality on the ground. All those minor issues & unreasonable restrictions can be solved & removed, I don't see any real challenge for openai to implement them. Give openai a few months, they will convince most vector database vendors & gamblers to pack up and leave the field.

for now.

let me repeat what I have already explained - when compared to today's leading AI tech, a "vector database" is just ancient tech. major players are going to build their inhouse solution or they'd conclude it to be some kind of labor intensive & low profit margin baggage and outsource it.

you can build a business around it, just like all major tech companies have cleaning guys work for them one way or another, people have to realize that it doesn't make carpet cleaning a high tech or strategically important business.

Work into performant vector search is an active area of research. If it were such a commodity, there wouldn’t be such a wide variance in performance among existing solutions.

There are a ton of open questions. If you think about Elasticsearch as a similar domain, you have complexity at the ingest, storage, and horizontal scalability layer. If you think places are going to invest in their own distributed system that handles these components, I think you’d be as wrong as saying that people will invest in their own managed Lucene implementations.