Hacker News new | ask | show | jobs
by matan_a 5208 days ago
There are quite a few other performance related points to think about for Solr speed for queries and indexing.

Here are some that come to mind right now that are very useful:

- Be smart about your commit strategy if you're indexing a lot of documents (commitWithin is great). Use batches too.

- Many times, i've seen Solr index documents faster than the database could create them (considering joins, denormalizing, etc). Cache these somewhere so you don't have to recreate the ones that haven't changed.

- Set up and use the Solr caches properly. Think about what you want to warm and when. Take advantage of the Filter Queries and their cache! It will improve performance quite a bit.

- Don't store what you don't need for search. I personally only use Solr to return IDs of the data. I can usually pull that up easily in batch from the DB / KV store. Beats having to reindex data that was just for show anyway...

- Solr (Lucene really) is memory greedy and picky about the GC type. Make sure that you're sorted out in that respect and you'll enjoy good stability and consistent speed.

- Shards are useful for large datasets, but test first. Some query features aren't available in a sharded environment (YMMV).

- Solr is improving quickly and v4 should include some nice cloud functionality (zookeeper ftw).

1 comments

These are good points. Solr/Lucene tuning is an art, so much so that some search consulting companies charge tens (or hundreds) of thousands of dollars for these services. That's the value proposition of Searchify's hosted search - if you just want search, you shouldn't have to worry about shards, commit strategies, batching, GC, etc. You just want to add your documents, search them, and get great, fast results, without having to become a Lucene expert in the process.

If this sounds interesting, check us out at http://www.searchify.com - We offer true real-time, fast hosted search, without requiring you to learn the innards of Solr or Lucene.

Good stuff. I see you're on Heroku as well which is always a win.

Now if someone could put SenseiDB on the cloud, i'd pay for it...