| Hi HN, I’ve been working in development, systems, and ops (SRE), and kept running into the same problem: adding good search to an application turns into building and operating a whole distributed system. You end up stitching together ingestion pipelines, embedding services, databases, and custom ranking logic - and then maintaining all of it. I built Amgix to handle most of the challenging parts. For developers: * one API for ingestion, embedding, hybrid retrieval, and ranking
* async ingestion, deduplication, retries, and embedding pipelines are built in For ops: * runs as a single container, but scales into independently deployable components
* automatic model loading and rebalancing
* supports PostgreSQL, MariaDB, or Qdrant behind the same API One area I focused on specifically is messy, identifier-heavy data (SKUs, part numbers, etc.). Amgix includes a custom tokenizer (WMTR) that handles those cases better than typical tokenizers, while still working well for normal text. There's a longer writeup on why standard approaches fall short for this kind of data in the docs: https://docs.amgix.io/why/ End-to-end, it handles ingestion, embedding, and fused ranking while still delivering typeahead-level latency on multi-million document datasets (benchmarks in the docs). Would really appreciate any feedback. |