| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by m0th87 4515 days ago

It was two weeks ago, and our startup was on the precipice of a major launch. We had completely rewritten our online publication site, which drives the bulk of our traffic. The product had to be shipped on-time - we had press releases, eager investors and a launch party dependent on it.

A few days before launch, things were not looking good. As admins manipulated articles in preparation for the launch, the servers kept crashing.

In a time-constrained major launch like this, a lot of nasty little hacks build up in the codebase. Our search system for admins was a complete mess. It was a custom solution that worked fine when admins managed a handful of database records, but now that they were managing thousands of articles, it was not scaling at all.

At the 11th hour, we dropped elasticsearch into our infrastructure. It worked like a charm. The servers stopped crapping out, and we launched on time.

Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr), or build anything on top to interface between the index and the queries themselves (Lucene). Thanks elasticsearch, you saved us!

2 comments

dc2447 4515 days ago

> Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr)

If you were using Solr there are a few operational modes to run in. Config file based or SolrCloud[0]. The latter is more akin the ES in terms of cluster management.

I agree though from an simplicity of deployment perspective at scale ES is has a much lighter learning curve.

[0] https://cwiki.apache.org/confluence/display/solr/SolrCloud

link

acdha 4515 days ago

SolrCloud is nothing like ES in terms of management: you end up running a separate zookeeper service with even more files which all have to be configured correctly just to get it running and you have to micromanage shard allocation to ensure that you can add nodes in the future but also not have it intentionally deadlock when a server fails and you no longer have enough nodes for a quorum. All of this happens with the usual contempt for sysadmins where things you need to know (“refusing to process requests”) won't be logged but a bunch of startup boilerplate will be, and simply configuring logging correctly requires (IIRC) editing two XML files and a properties file.

`java -jar elasticsearch.jar` does a better job and that's basically all it takes. I'm planning to switch as soon as https://github.com/elasticsearch/elasticsearch/issues/256 lands.

link

darkarmani 4514 days ago

I lost count of the +1s. That issue must have around +180. :)

link

troels 4515 days ago

Did you try/consider Sphinx? It's simple and it's quite fast. I'm using that and I'm pretty happy with it, but I might investigate ES at some point to see if I can squeeze a bit more speed out of it.

link

rch 4515 days ago

You might also take a look at the search functionality in Riak. I've run both Solr and ES, the latter at significant scale, and I'm leaning more towards Riak going forward. The difference is mainly convenience, so not a reason to switch off something that's working already.

link

troels 4515 days ago

Hadn't considered Riak, but I can see that it has some full-text search capabilities. Any idea about its features and how it compares in performance, as a raw search index?

link

biscarch 4515 days ago

Riak 2.x uses Solr to index values from K/V with AAE. If you're interested in how using it looks, I wrote a post using geospatial data here[1].

[1]: http://www.christopherbiscardi.com/2014/02/07/geospatial-ind...

link

rch 4515 days ago

If it's just Solr underneath, then why is the pesudo-Solr API implementation not a complete implementation? Something to do with each node being an isolated Solr instance maybe?

link

biscarch 4515 days ago

There is backward compatibility with the old Riak Search (which wasn't Solr based) intended to not break old applications, but you can query with any currently implemented Solr client afaik.

link

rch 4515 days ago

I see now - interesting. Big shift from 2010 to now. Thanks!

http://www.slideshare.net/rklophaus/riak-search-erlang-facto...

http://basho.com/ricon-west-videos-riak-search-2-0/

link

rch 4515 days ago

I don't know of any publicly available relative raw performance benchmarks, and haven't done any myself. My guess is that the compelling features would be more in the realm of node operations and recovery from node failures.

Edit: Apparently my Riak knowledge is dated now anyway. It looks like I have some research to do myself, but it's pretty exciting stuff.

link

m0th87 4515 days ago

As far as I can tell, Sphinx has a more involved setup process. Also our search runs against JSON documents, which seems to suit Elasticsearch better than Sphinx. I might be wrong on both counts though, we really didn't look into Sphinx enough to give it a fair appraisal.

link

nasalgoat 4515 days ago

Sphinx is a bit too 1:1 - it only works as a single server, not a cluster.

link

troels 4515 days ago

Well, you could simply have multiple instances running on different nodes. It's manual work, but by no means impossible. In my setup, I have a sphinx server running on the same node as my web server (Which is the consumer of the search). So they scale with each other. For more advanced uses, it's probably not adequate, but it's not a big concern of mine.

link