Hacker News new | ask | show | jobs
by whalesalad 1777 days ago
PostgreSQL should be everyone’s first choice for a data store. It can do so much, including serving as your full text search system.
8 comments

Elastic's main selling point is not so much the full text search.

The search is what it does but most of it's value is centered in the management/scaling/monitoring of full text search over many machines.

I love Postgres but it's "clustering" story is definitely not as user friendly.

for logs (where you can shard on a modulo of timestamp) you might have luck with CitusDB (PostgreSQL sharding)
Is there a real scaling? Like increasing node count on indexing latency or CPU metrics?
Yes. You can scale ES to fairly massive data volumes. Postgres is a very different system with different design constraints.

There are plenty of peta-scale ES clusters in the wild

Have a look at YugabyteDB then.
Why would I do that when Elasticsearch is a proven search engine.
I don’t know, maybe you yourself get confused. Why would you say this otherwise:

> I love Postgres but it's "clustering" story is definitely not as user friendly.

Then have a look at YugabyteDB.

I love postgres and the full-text search feature works great in some use cases, but it is not really comparable to elastic search in many scenarios (huge document stores, complex text processing or querying, etc).
For sure, but I would posit most startups and smaller stage companies can get by with it. It really comes down to indexing data properly and designing for your search patterns. If your search patterns are vast or change constantly, ES might be better, but if you just need basic text search over X attributes, Postgres will be sufficient.
Do you know for sure that postgres doesn't perform as well as elasticsearch if you don't use the relational capabilities of postgres?

Instinctively I believe what you're saying, just wondering if you know for sure.

Yes I know for sure. Postgres search is essentially an easier to use regex engine. If you have a recall-only use case and/or a small dataset, then that works great. As soon as you need multiple languages, advanced autocomplete, misspelling detection, large documents, large datasets, custom scoring, etc you need Solr or ES.
While I don't doubt that you know your usecase and weighed/tried the option.

> Postgres search is essentially an easier to use regex engine.

I'm not sure exactly what you meant to convey here, but if you're searching with LIKE or `~` you're not doing Postgres's proper Full Text Search. You should be dealing with tsvectors[0]

> As soon as you need multiple languages

Postgres FTS supports multiple languages and you can create your own configurations[1]

> advanced autocomplete

I'm not sure what "advanced" autocomplete is but you can get pretty fast trigram searches going[2] (back to LIKE/ILIKE here but obviously this is an isolated usecase). In the end I'd expect auto complete results to actually not hit your DB most of the time (maybe I'm naive but that feels like a caching > cache invalidation > cache pushdown problem to me)

> misspelling detection

pg_similarity_extension[3] might be of some help here, but it may require some wrangling.

> large documents, large datasets,

PG has TOAST[4], and obviously can scale (maybe not necessarily great at it) -- see pg_partman/Timescale/Citus/etc.

> custom scoring

Postgres only has basic ranking features[5], but you can write your own functions and extend it of course.

Solr/ES are definitely the right tools for the job (tm) when the job is search, but you can get surprisingly far with Postgres. I'd argue that many usecases actually don't want/need a perfect full text search solution -- it's often minor features that turn into overkill fests and ops people learning/figuring out how to properly manage and scale an ES cluster and falling into pitfalls along the way.

[0]: https://www.postgresql.org/docs/current/textsearch-intro.htm...

[1]: https://www.postgresql.org/docs/current/textsearch-intro.htm...

[2]: https://about.gitlab.com/blog/2016/03/18/fast-search-using-p...

[3]: https://github.com/eulerto/pg_similarity

[4]: https://www.postgresql.org/docs/current/storage-toast.html

[5]: https://www.postgresql.org/docs/9.5/textsearch-controls.html...

Scoring results in Postgres requires scanning all matches, which is slow if you have a lot of results.

Elastic search and other search solutions don’t have this problem.

Searching a structured database just isn't the same as having a full on indexed search engine. Those are different tools for different usage.
Even though we are currently replacing ES (hosted on elastic.co) with Postgres for ~100M docs + low QPS usecase, it's no real competition to Elasticsearch. There are better™ alternatives for niches (like Algolia), but nothing just works like elasticsearch at scales when not everything can fit in a single machine.
a) Elasticsearch should not be used as a primary data store.

b) PostgreSQL does not compare to Elasticsearch when it comes to full text searching capabilities.

c) PostgreSQL has no vendor-supported, built-in solution for horizontal scalability which is a big reason why you would choose Elasticsearch over a more lightweight search system.

Not a good story with tokenizing asian languages. And even the way how to tokenizes roman languages is not that great.

However, it does get one up to that 80% mark for text search. But that other 20% is why Elasticsearch and Algolia etc exists.

I don't think so, elastic's https://www.elastic.co/guide/en/elasticsearch/reference/curr... is quite ahead of that.

Example here https://www.judyrecords.com/info

How much of Elastic's usage is ELK logging vs. application search?
Are you aware that Lucene, the technology that powers ElasticSearch runs on top of SQL?
No it doesn't.