Hacker News new | ask | show | jobs
by porker 1186 days ago
> - Text search is a library function

What text search will provide me with the same features as Elasticsearch? Index time analysis, stemming, synonyms; search time expansions, prefix matching, filtering and (as a separate feature) type ahead autocomplete?

I would love to never touch another Elasticsearch cluster so this is a genuine question.

2 comments

What about any of this prohibits it from being a library?

https://lucene.apache.org/core/

This is the Java library that ES is based on. Without even having to look at it I can make the following judgement:

It should be easy to port to any language.

It's open source, and it's Java. Java has no special features that makes it impossible or particularly difficult to replicate this functionality in any other compiled language, like C, Rust, Go, or any other language that is not 100x wasteful of system resources.

> This is the Java library that ES is based on.

Based on, but Elasticsearch is not just a server wrapped around the library. Features ES has are not in Lucene, otherwise anyone could release a competitor by wrapping the library.

> It should be easy to port to any language.

You win the "Most Hacker News comment of March 2023" award. This thread is talking about less effort, and you bring up porting Lucene to another programming language.

I thought it was already ported to other languages eg. https://clucene.sourceforge.net/

Not sure about feature parity though.

> Based on, but Elasticsearch is not just a server wrapped around the library. Features ES has are not in Lucene, otherwise anyone could release a competitor by wrapping the library.

Those competitors exist.

Go is not less wasteful than java, both are garbage collected and their memory pressure depend highly on the given workload, and the runtime of the program. But java allow more GC tuning and even different GCs for different use cases (ie: shenadoah and ZGC favor very low latency workloads, while the default G1GC favors throughout (not that simple, but you get the point))

Regardless, Java/Go tier of performance is good enough for this kind of thing.

I was referring to Ruby/Python when I said 100x wasteful languages.
Problem is it doesn't support HA. You're stuck on that single server model. Upgrades always = downtime = painful. You're also missing things like self-healing and your Lucense index can corrupt.

Real world experience says better to move away from it e.g. lots of self-hosted Atlassian instances over the years. Lucene was a major pain point.

Manticoresearch provides mosts of the listed features.
Thanks for the reminder. Manticoresearch is an alternative I haven't tried yet. I tried the hip alternatives (Melisearch, Typesense) in autumn 2022 and both were severely lacking for CRM workloads compared with ES.