Hacker News new | ask | show | jobs
by tmikaeld 1229 days ago
My team tried to use Meilisearch for large datasets, unfortunately, it's impossible to plan the RAM usage. If you have very little searches, it consumed very little, but if you have a lot of search traffic, it may consume more than we could provision beforehand. This made it too unpredictable and too expensive, so we went with Manticore instead. I don't know if this has been addressed in 1.0, hopefully it has.
2 comments

Do you have any numeric definitions for few and lots?
I think that they might have fixed it. I noted this as a problem with earlier meilisearch releases as well, but reading through the documentation it looks like they don't require the entire index to be in memory any more, allowing it to be a memory mapped file.

https://docs.meilisearch.com/learn/advanced/storage.html#lmd...

>For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk, so all the data structures can fit in memory.

> [...]

>It is important to note that there is no reliable way to predict the final size of a database. This is true for just about any search engine on the market—we're just the only ones saying it out loud.

Looks like a 10MB document is taking ~200MB, from their docs. I don't think that scales linearly though, since it's a reverse index it is going to scale based on the number of unique words it finds, with each document adding a bit on top of that. You'd expect it to have a pretty big index to cover common english words, and then each document adds a bit on top of that.

Definitely seems like somewhere they could make some improvements though. Some transparent compression could probably help, and with zstd's dictionary feature it can be fine tuned to the data they're actually seeing.

Not about to replace xapian in kiwix (offline wikipedia reader) any time soon, I think.

Our index was aimed at handling 20 000 documents at total of 35MB of CSV, this would balloon into 0.7GB to 1GB of RAM and we expected at least 1000 of these indexes, which would require dedicated servers with 1TB of RAM. This was when Meili was at version 0.27.

With manticore, we've tried to run into these issues in benchmarks, but the only problem we got was temporary high IO load when indexes need to be re-indexed with new or changed documents. In total it's at 50-70% of the RAM usage compared to Meili.

We'd be happy to re-visit, but looking at the docs - it seems to be about the same as it was back then (a year ago).

You should definitely try Meilisearch again. We have optimized a lot of the consumption and indexation performance. Even with all the improvements, we think it's essential to continue focusing on it during 2023.

And indeed, Meilisearch uses memory-mapping, which means that everything is on disk, and it will try to take as much memory as possible. For your information, we successfully ran a 115M documents dataset on a 1Gb RAM machine.

BTW if you are using the default row-wise Manticore storage, you may try out the Manticore columnar storage [1]. It can decrease the RAM consumption further.

[1] https://manual.manticoresearch.com/Introduction#Storage-opti...

Thank you for that, I'll give it a go today!
It seems to be too slow for our amount of updates, updates would need to rewrite the whole column.
How is the startup time?

Would be nice if you could check a query and then start the instance with an appropriate memory configuration.

Pretty much instant, it loads data from a memory-mapped file so having a fast SSD for that is a must.
Yes, indeed, it's crucial to have an SSD. With it, loading will be instant (a few ms).