Hacker News new | ask | show | jobs
by omurphy27 3302 days ago
Algolia is an amazing service and an absolute joy to use. They deserve all the praise they're getting and then some.

However, it's easy to exceed their record limits, especially since you need to duplicate your index every time you want to 'sort by' something.

For instance, if I wanted an option to sort my results by date, and to then search these, I'd need to create 2 new slave indexes for date ascending and descending respectively. Sorting by anything else, like price, means creating yet more indexes and suddenly it's easy to turn 30K records into 150K. This happened to me and I ended up having to roll something custom instead (Vuejs frontend and Sphinx Search backend), since my client balked at the extra cost.

But, if you have a small dataset, or are fine with the costs, then Algolia is spectacular.

2 comments

Yes.. this is something our startup is working on fixing. How you sort and how you prioritize your fields to be searched are all configurable at query time.

If you pay for a million records, you should be able to store a million records.

https://searchera.io

We are currently in beta and our website is not fully up but the demos give an idea of what is possible.

Nice.

This brings up a related point.

How strong are the defensibility and separately the network effects in Algolia's business?

I guess once they have a customer, it might be annoying for that customer to switch. Is that accurate?

But is there any reason for the 101th customer to use Algolia other than the brand? My hunch says no, and that there aren't any network effects.

Thanks ! Actually 40% of our current beta users are existing Algolia customers. Maintaining separate indexes for every sorting / ranking option and intentionally restricting the application to stay under limit is a drawback that many complain about.

Our On-Premises option is something which a few potential customers have been interested in.

So won't you have the same problem?

Limited customer lock in?

Of course. We are doing our best to build a rock solid product which is fast, flexible, cost-effective with excellent support.

Finally, it is the customers call. They are going to choose what works best for them :)

The response times are six times as long as with Algolia. I assume because of latency.

Will you offer distributed datacenters as well?

Yes, we are currently hosted only in NY. Can you please ping beta.searchera.net and check the latency from your location ?

Once we are out of beta we will be offering distributed datacenters. West coast USA and Europe to start with. The option to install in your own servers / cloud provider is another option.

If you would like to try it out, I can always bring up a host quickly next to your location on digitalocean or aws. Please send me an email on hello@searchera.io

90ms from Europe

I signed up for your beta and will follow your progress.

Is your solution based on solr/lucene/elasticsearch?

Ours is a custom index written mostly in 'C' and bit of x86 assembly. It is very lightweight and extremely fast even without the use of replica indexes for every sort order.

Thanks for signing up.. Will get you started as soon as we have our additional servers up.

Yes,

especially if you have records that not even change often.

If you have 300,000 items in an index that you want to sort in 4 ways and want to update eg the price daily, you already consumed 36 million operations of the biggest non enterprise plan that includes 50 million operations.

Just by testing and tweaking the index every other day, we already use up to 500,000 operations.

But then again setting up search infrastructure in different countries and synching it in realtime also comes at a hefty price. So we will stick with Algolia for now, the speed is breathtaking and we will never be able to achieve 20ms responses with eg an Elasticsearch cluster.

(n.b: I am an engineer at Algolia) Algolia's engine computes relevance at indexing time by design, allowing us to deliver optimal search performance at query time. As a result, each new `sort` - by price, by name, by date added - requires several indices containing identical data.

To make this easy to implement, we provide a way to create index replicas, read-only indices that can have different settings from the master index.

When using replicas, every record added to a master index gets also added to the replica index. Same goes for deletion and update operations.

All indexing operations done on replicas are not billed.

By using replicas, you can adjust your calculate by removing the factor of four you included for each index, meaning that 300K*30 days = 9million operations/month. This assumes you update the entire index daily, whereas you could also only update the prices that changed, which would in turn further reduce the number of operations.

> we will never be able to achieve 20ms responses with eg an Elasticsearch cluster

Why not? Assuming good hardware, why isn't that possible?

IME, I haven't been able to replicate Algolia's search responsiveness with ElasticSearch, even with good hardware. I don't think ES/Lucene was ever designed for that use case. IIRC, Algolia was designed to perform well even on mobile phones. I wouldn't dream of getting Lucene to run performantly on a mobile phone.

I'd love to see if someone has done any of the "realtime" Algolia demos backed by ElasticSearch.

In any case, ES excels at very different use cases - I've only seen Algolia provide "basic" search.

Algolia has done some terrific work on search latency (and written about it which is awesome https://stories.algolia.com/how-algolia-reduces-latency-for-...).

I think ES can get there, but depends a lot on what hardware you deploy (SSDs!), how you build your index, and whether you can geographically distribute your search engine close to your users.

We have one ES cluster with hundreds of queries per second that gives median 9ms response times and 99th percentile around 160ms. Another cluster with 100x more data that gets 20-25ms median response times and 99th percentile at 360ms.

Now both of these are just the ES response time, there is additional overhead in responding to an API request and then you also start to get into where your data centers are located relative to the end users.

More (slightly out of date) background on our config: https://data.blog/2016/05/03/state-of-wordpress-com-elastics...

Totally agree with you. Algolia's speed is pretty amazing, but its price is pretty hefty also. We ended up switching over to Elasticsearch which is much cheaper and more flexible in certain ways