Hacker News new | ask | show | jobs
by zswaff 844 days ago
Great question. We're taking an unorthodox approach here.

I don’t know of too many tools that are offering unlimited free calls to OpenAI, which means that all cool LLM-enabled features are price-gated or premium or otherwise limited. It's a bummer to restrict that value. Our bet is that LLM pricing will follow a Moore’s Law-style pattern, at least for a while, that will mean that we can offer better and cheaper LLM-enabled features over time. So in short, we're subsidizing some of the costs now on a longer-term bet.

That said, we can be smart about how we do things technically. We embed, compress, and omit stuff as much as possible to minimize tokens.

Also, we actually just completely fail to handle some things (something like reprioritizing a backlog of 10k tasks just wouldn't work for us right now) so we do hard cap some actions.

1 comments

Regarding embeddings - I am assuming you are using ada-002 or have you moved on to 3-small already? Do you have a particular strategy for migrating embedding models other than re-embed the whole dataset? And lastly, what is/are your vector store(s) of choice? I am not quite sure of your scale but I have found that north of 50 million vectors a lot of the current options get a bit weak in the knees, especially if you index and query concurrently at high rates.
- Still on ada-002, planning on migrating later this week actually

- Current plan is to re-embed everything but I'm very open to better ideas there haha. Is there a better way?

- I've heard some similar stuff but we haven't run into it yet. What are you working with?

One strategy is to store the model version used for each vector, and then using the appropriate model to embed your queries. Of course that means that for a certain time frame you may need to blend two result sets, so that may not be ideal for your use case. You could also choose to just re-embed the last 90 days or so, or in your case on a project by project basis. Of course all of this raises your search complexity because you need to track your model versions.

From a vector store side we have worked with Weaviate, Qdrant, Pinecone, Milvus, and Elasticsearch. All of them had their pros and cons, and none of them were as stable as we liked once you really went to scale. Cloud deployments were rather pricey as well at that volume. We ended up with a mix of Qdrant Weaviate and Elastic for different workloads.