Hacker News new | ask | show | jobs
by oersted 618 days ago
Check out Quickwit, it is briefly mentioned but I think mistakenly dismissed. They have been working on a similar concept for a few years and the results are excellent. It’s in no way mainly for logs as they claim, it is a general purpose cloud native search engine like the one they suggest, very well engineered.

It is based on Tantivy, a Lucene alternative in Rust. I have extensive hands on experience with both and I highly recommend Tantivy, it’s just superior in every way now, such a pleasure to use, an ideal example of what Rust was designed for.

8 comments

> It’s in no way mainly for logs as they claim

Where can I find more information on using it for user-facing search? The repository [0] starts with "Cloud-native search engine for observability (logs, traces, and soon metrics!)" and keeps talking about those.

[0]: https://github.com/quickwit-oss/quickwit

That just seems to be the market where search engines have the most obvious business case, Elasticsearch positioned themselves in the same way. But both are general-purpose full-text search engines perfectly capable of any serious search use-case.

Their original breakout demo was on Common Crawl: https://common-crawl.quickwit.io/

But thanks for pointing it out, I hadn't looked at it in a few months, it looks like they significantly changed their pitch in the last year. I assume they got VC money and they need to deliver now.

But the demo does not work.

I tried "England is" and a few similar queries. It spends three seconds then shows that nothing is found.

I tried it once and it instantly showed no results, but then I tried it again and it returned results in <1s. Just try it with a bunch of queries, I think there's caching too so it's hard to gauge performance properly.

The blog post about the demo is from 2021 and they haven't promoted it much since. I'm surprised that they even kept it online, according to the sidebar it was ~$810/month in AWS at the time.

Yes. We should shut down this demo. We reduced the hardware to cut down our costs. Right now it runs a ludicrously small amount of hardware.
Has anyone tried openobserve (https://github.com/openobserve/openobserve)? How does it compare/contrast to Quickwit as an "Elasticsearch for logs" replacement?
I have been using Tantivy for Garfield comic search for a few years now, it has been really nice to use in all that time.
I'm simultaneously intrigued and thinking this is a funny joke at the same time. If this isn't a joke, I would love an example.
Luckily it is not a joke!

Its been about I have had running in some capacity for some years by now through a couple of rewrites. At some point Discord added "auto-complete" for commands, this meant that I can do a live lookup and give users a list of comics where some piece of text is.

My index is a bit out of date, but comics before September last year can be searched up.

The search index lives fully in memory as it is not that big since it is only 17363 comics. This does mean that it is rebuilt every startup, but that does not take long compared to the month long uptime it usually has.

Example of a search for "funny joke": https://imgur.com/a/J4sRhPJ

Hosted bot: https://discord.com/application-directory/404364579645292564

Source code: https://git.sr.ht/~erk/lasagna

Meilisearch is great when it works, but when it breaks it's a total nightmare. I've hit multiple bugs that destroyed my search index, I've hit multiple undocumented limits, ... that all required rebuilding my index from scratch and doing a lot of work to find what was actually going on to report it. It doesn't help that some of the errors it gives are incredibly non-specific and make it quite difficult to find what's actually breaking it.

All of that said, I still use it because it has sucked less than the other search engines to run.

Can you give any sense as to the conditions under which it just breaks like that? This sounds quite concerning!
Does Meili support object store backends?
No, and no clustering / HA yet I believe. It is awesome though with the right use case. We've had it in prod for a few years now
The big issue with tantivy I've found is that it only deals with immutable data. So it can't be used for anything you want to do CRUD on. This rules out a LOT of use cases. It's a real shame imo.
I’m pretty sure that Lucene is exactly the same, the segments it creates are immutable and Elastic is what handles a “mutable” view of the data. Which makes sense because Tantivy is like Lucene, not ES.

https://lucene.apache.org/core/7_7_0/core/org/apache/lucene/...

It is indeed mostly designed for bulk indexing and static search. But it is not a strict limitation, frequent small inserts and updates are performant too. Deleting can be a bit awkward, you can only delete every document with a given term in a field, but if you use it on a unique id field it's just like a normal delete.

Tantivy is a low-level library to build your own search engine (Quickwit), like Lucene, it's not a search engine in itself. Kind of like how DBs are built on top of Key-Value Stores. But you can definitely build a CRUD abstraction on top of it.

Certainly one could build that on top of quickwit — which also doesn’t allow crud — but it’s not trivial. You need to insert a new record with changes, then delete the record you want to update. The docs instruct that the latter action is expensive and might take hours(!). Then one would need a separate process to ensure this all went down appropriately (say server crashes after insert of updated record but before delete succeeds). Meanwhile you’ve got almost identical records able to be searched. Just not very nice for anything involving CRUD.

Please do advise if I’ve missed something here. I was really excited about using quickwit for a project but have gone with Meilisearch precisely for these reasons. Otherwise it would be quickwit all the way.

> Tantivy, it’s just superior in every way now

It lacks tons of features ES and Solr have, most notably geo search, but what it does it does a lot faster.

Quickwit indicates it is for immutable data and not to be used for mutable data. Is that the case in your experience?
Thanks for this info.