| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jnstrdm05 134 days ago
	This looks sick! Did you build this for yourself?

1 comments

kingcauchy 134 days ago

I built this for myself because I hated running a large ElasticSearch instance at work and wanted something that would autoscale and something that allowed for reindexing data. I also had a lot of experience running a large BigTable/Elasticsearch custom graph database I thought could be unified into a single database to cut costs. Started adding an embedding index for fun based on some Google papers and now here we are!

link

perfmode 134 days ago

what google papers?

link

kingcauchy 134 days ago

Not strictly google but microsoft/bing too, here's the top ones from my notes:

https://arxiv.org/abs/2410.14452 spfresh, https://arxiv.org/abs/2111.08566 spann, https://arxiv.org/abs/2405.12497 rabitq, https://arxiv.org/abs/2509.06046 diskann,

I have a variety of blogs that I used too and reference implementations!

It's a Rabit[Q]uantized Hierchical Balanced Clustering algorithm we use for the vector index and we use a chunked segment index for the sparse index if you're curious! Happy to discuss more!

link

perfmode 134 days ago

Curious if you’re using any SIMD optimizations for numerical calculations.

link

kingcauchy 134 days ago

Yes we do use SIMD heavily! https://github.com/ajroetker/go-highway I also added SME support for Darwin for most algorithms. We use it in the full-text index, all over the vector indexes and heavily for the ml inference we do in go especially.

link