Hacker News new | ask | show | jobs
by mlthoughts2018 2433 days ago
I’m saying in my experience there is no such thing as one singular “decent solution” for search. It varies enormously from use case to use case, customer cohort to customer cohort, etc.

To even know if you’re buying a decent solution from Algolia or not, you’d already have to hire pretty much all the same staff you’d have to hire to more cost-effectively build it in-house.

I think the fundamental myth, just like with Rekognition, is that if you ship off your data and the third party trains some model (most likely fine-tuning a base model), then you’re done, problem solved.

Even for businesses where search is not a core part of their direct value proposition to customers this is flagrantly untrue.

7 comments

I am not trying to be flippant, but maybe you are misunderstanding what Algoia provides?

Let's say I have an eCommerce platform. The search provided by the framework is slow and I want to put together an instant search feature. It's too slow, and I don't have a search specialist to speed it up. So I add the Algoia plugin to the platform, sync my products and the work is done. Literally, real world example, install the plugin and suddenly my archaic eCom platform has instant search. Not only that, but I can manage the search result weigths and preferences from within Algolia with no special experience. My existing search couldn't do that.

I am not sure where you would need an ML specialist in any of this, certainly not a whole team. For most people Algolia out of the box is plenty.

> Not only that, but I can manage the search result weigths and preferences from within Algolia with no special experience.

This was a major selling point for us when we switched to Algolia. The business people want to be able to manage things like search without having to go through the programmers, just like they manage GA/GTM and such.

>To even know if you’re buying a decent solution from Algolia or not, you’d already have to hire pretty much all the same staff you’d have to hire to more cost-effectively build it in-house.

You just have to pay Algolia, wire up the APIs, and then see if whatever stakeholder that was complaining about search stops complaining. If they do then it's good enough.

This is correct. Lets say you have an ecom site with product search. How do you know if the search is good? You search for stuff and compare to what you expect.

Even if it sucks, as long as their is revenue lift, you will keep it until the next solution comes along.

And Algolia has great analytics, so you can actually mesure business value from actual search queries: you can tie a query to a purchase and then run further analysis on that. It’s powerful and doesn’t require any exceptional engineering talent to use.
Totally false. This is like massively overfitting a high-order polynomial regression to your data. The fit looks good enough, then the next data point comes in and breaks in a way the existing model cannot be hacked to account for.

The search results you believed were implicitly tuned to some feedback mechanism slowly experience creep as the customer cohort changes and data distribution changes until before you knew it your management of the search solution is a ceaseless game of whack-a-mole siphoning off engineering resources at a rapidly increasing rate. It’s the same false promise of just having some engineers stand up Elastic Search.

Not all decisions are good decisions, it depends on who makes the call. In most cases, someone ask you to use a solution because it looks/feels better. In this case Algolia showed how fast and how well it could be implemented. Once the person who takes the decision is convinced it will be implemented. It's mostly marketing. Probably less than 1% of all e-commerce websites measures the impact of a decision.
I did say as much in my original post: Algolia and Rekognition are marketed at CTOs and directors of engineering who want to be sold on a magical line item that removes a whole concept area from their concern, especially one associated with the difficulty of hiring and affording good machine learning staff who can work on the problem both pragmatically and theoretically. They want to be sold a story.

I will say though that your 1% claim is way off in my experience (which includes 3 medium and large ecommerce companies). These companies employ armies of product managers and analytics staff that measure the shit out of everything from the color of a button to the size of font in a banner display for a discount promo code. These things aren’t usually measured because they find value, rather just to give the appearance of data driven decision making and justify job perpetuity.

> you’d already have to hire pretty much all the same staff

This is terribly wrong. It's like saying you need as many people to setup Solr/ElasticSearch as you'd need to build a custom search engine.

It requires considerably fewer people to setup, manage, and optimize Algolia than it does to setup, manage, and optimize ES.

Case in point, Twilio.

You’re not even addressing the engineering costs though. The portion of cost of a search engine solution attributable to the set up of Elastic Search is basically zero. The cost is understanding if the search surfaces relevant items for the specific use case, including asymmetric costs for surfacing bad items in many use cases. Not to mention that plug and play third party solutions like Solr / ES are highly inapplicable to a lot of use cases.
I set up algolia for a client while doing some work for them. We needed to add search to a website we were building which searched over data in the client's CMS.

I could have set up elasticsearch. But algolia was cheap and easy to configure, and we could just click around the algolia UI to tweak things like "number of allowed spelling mistakes". We didn't need to proxy anything or set up routes - we just pulled in the algolia JS library to run queries from the application. It was easier for the client to maintain in an ongoing way rather than maintaining their own elasticsearch instance on EC2 or something like that.

I'm sure there are plenty of times you'd want to run your own elasticsearch instance, and I think that would also have been a reasonable choice for us. But I still feel pretty happy with the choice to use algolia.

Arguments to set up your own elasticsearch instance remind me of the criticism against dropbox - "just run your own server with rsync! Its so easy!". Paying someone a small amount of money to do that for me is often a great deal.

> You’re not even addressing the engineering costs though

This is a pretty silly thing to say. How could you possibly know what I'm addressing?

I actually have experience with all of these scenarios, and bar none Algolia has been the fastest, smoothest, bug-free-and-feature-rich-for-the-dollar rollouts for search experiences I have ever come across.

> including asymmetric costs for surfacing bad items in many use cases

Sure, if you're Google, Netflix or Amazon. For the 99.9% of the rest of the world where search isn't core to the business, there's unlikely to be any discernible impact going one way or the other, except saving money and launching faster.

Why do you buy clothes when you could just make them yourself?
A better anology would be to consider a person with very special dietary needs, and then say “why buy dinner when you can make it yourself and actually be sure it meets the dietary needs.”
Sounds like a good case for Algolia Professional Services! (If this exists?)

I remember pushing Google's search appliance for a large media company some 10 years ago or so (no benefit to myself though, which was pretty noob). It made sense at the time, and solved something for them better than they probably would have implemented it themselves in a good enough way. The most complicated part was setting up rules about what was public / private / internal etc.

Except this would be prohibitively expensive. The labor cost of those specialized employees as consultants would be huge.
So you are saying it’s too expensive to hire those specialists but you need those specialists to setup a working search solution? Is your stance that only companies that can afford a bespoke solution should implement search?
You do know that paying for these kinds of services from consultants is much more expensive than hiring in-house, right? Consulting is purchased either because you only need the specialization for a short time and can pay the mark up for the flexibility, or because you need some external virtue signalling of prestige or authority to overcome in-house political blockers. Consulting is absolutely not the cost-effective option for a specialization you’ll need frequently.
> you’d already have to hire pretty much all the same staff

I don't know what most Algolia customers are like, but we have been using them for a long time and we are a team of... 2 (only 1 being a technical person).

So hiring a whole team to work on search is certainly not an option for everyone. By paying $29 / month and spending a couple of hours on integration, you get a working search that most small shops will be happy with. That sounds like a valuable proposition to me.

It seems like companies don't have to evaluate it your way? They could informally try it out on a few searches and say, "seems to work better than the other one." This is how Google became popular.

Or maybe do an A/B test. It's a little more formal but doesn't require any search-specific knowledge either.

This approach typically fails quite bad in practice. The subjective impression of success depends on the people around at the time of the decision and the set of queries they chose to inspect. As corner cases pop up with significant cost in production (e.g. surfacing nudity in an image search for a query where it’s highly inappropriate) you become less and less capable of understanding why or hacking business logic in to an already brittle system.

The same problem appears for A/B testing different result orderings. It all hinges on the metrics you chose. If you only looked at e.g. top 5 click through rate, and then later a page redesign brings the top 10 results above the fold, and 6-10 are garbage, you’re suddenly screwed with no systematic way to adjust underlying parameters of the search model that control these things, or really even to get analytics data about it because you felt you could just outsource to a place like Algolia and not have in-house expertise, or that some engineers without statistics backgrounds could just hack it.

What do you do when you don’t have the budget for a team of in-house experts and you need something that is good enough and you need it now.
Well, what you don’t do is rush to buy a wrong thing because you’re desperate and it has good marketing.
Disclaimer: I'm a software engineer working on the core search engine at algolia, but the opinions of this post are my own and not an official statement.

Search is a hard job, and it's hard in many ways. The most obvious difficulties are related to relevance, and yes this part is specific to each business case. But that's not the only issues one has to solve when implementing search.

Even without speaking about the software you run, running it so that you have high availability, fast search results, fast enough to provide search as you type, reliable indexing, low latency in several regions ... This is the first service we provided, this is what SaaS is about. Being on inside of a SaaS compagnie, shows you the amount of works we save to our customers.

Then, about software solution itself. Providing search is not just about running generic piece of code. It's a whole eco system, continuously evolving. Working with a SaaS solution is hiring a team of more than hundred engineers dedicated to search. From the core software to frontend UI modules, the amount of engineering needed is way above what most companies can dedicate to search.

Back to relevance, some aspects are specific to business logic but some are also specific to search. We provide the search knowledge, so that you can focus on your own issues.

And for the software behind our services, we're not trying to build the one size fits all search tool, but a tool dedicated to the kind of search needed in today's web applications. I'm obviously biased, but I strongly believe that the kind of search we focus on, fast accurate top results rather than exhaustive search, fits terribly well our users' needs.

I do both search and ML solutions in the area that Rekognition targets.

In both cases they are great 80/20 solutions (actually Algoria is more like a 95/5 solution in most cases).

I also do computer vision and search in the areas targeted by Algolia and Rekognition and have not found this to be true at all. For face detection for example, Rekognition was completely unusable for my company.
I haven't used face detection by Rekognition so I can't comment.

However, I'm surprised it's that bad.

I've done two projects in the last 6 months that required face detection, and in both cases combinations of DLib and OpenCV performed perfectly well. Since these are entirely off-the-self models I don't see why Rekognition should - in principle - be any worse.

Yeah, pre-built dlib and opencv models are similarly not realistic for real world applications. We ended up needing to train our own version of MTCNN and separately train celebrity face recognition.

Especially when detecting in images with many faces, these legacy off the shelf things built on Viola-Jones type models or HoG feature extractors are just not acceptable by comparison with deep learning models.

And even at that, you need to fine tune the model to your own specific dataset with appropriate weights to reflect asymmetry in false positives vs false negatives. Simply using any off the shelf model, even a deep CNN model, virtually never works in practice. Unless your real life task is well approximated by the academic data set used for training (and it never is), you’re going to need a computer vision engineer involved.

You mentioned face detection earlier and now you are talking about face recognition. There's a huge difference.

For face detection DLib and OpenCV work really work in real world applications. As I mentioned I've deployed two real-world solutions using them in the past 6 months.