| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alexcg1 1429 days ago

In terms of matching embeddings and performing similarity search on text/images - folks are already using the framework (Jina) for that and getting decent results.

In terms of processing the PDFs and extracting that data. idk. That depends on a lot of factors - e.g. do you need to OCR the PDFs or can just extract text directly? Either way, should be possible to write a module and then easily scale it up (Jina supports shards/replicas). Anyway, lemme know. I'm in talks with folks about this kind of shitshow...uh...use case now.

Jina supports multiple vector database backends, like Weaviate, Qdrant and others. For others (like Milvus), suggest you ask on the Slack [0] - responses tend to be fast.

[0] https://slack.jina.ai