Hacker News new | ask | show | jobs
by mlthoughts2018 2433 days ago
I’ve always had a hard time understanding the value proposition in the same way I don’t understand the value proposition of e.g. AWS Rekognition.

Paying per use certainly doesn’t make sense, because it has to be qualified by the accuracy you get per use.

And there’s no serious way to understand the accuracy you get per use (on your specific unusual distribution of queries) without employing the expensive ML / stats engineers you probably thought you could avoid hiring by outsourcing to Algolia / Rekognition in the first place.

But once you need to hire them anyway, you might as well utilize them to build this type of thing in-house in ways that are much more tailored and optimized around your in-house data models and data integration tools.

To put in perspective, I’ve worked in several companies (from small start-ups to large ecommerce sites) that have a variety of search needs spanning plug and play Lucene all the way to highly customized joint embedding neural network based nearest neighbor search, and tons in between.

The distribution of text in e.g. the support center search use case was totally different than the product search use case or the document store use case, where highly unique word distribution, special words, frequency of required updates to the search index, asymmetric costs of surfacing bad or deleted content items, etc., was the norm.

Every search use case was different and needed care to develop unique annotated result sets to measure mean reciprocal rank, NDCG, etc., as well as simple stakeholder subjective opinion of quality.

Short of basically hiring Algolia to be a gigantic consultant on all these things, I don’t see how it could actually be valuable.

I suspect it’s just an easy sell to CTO types that don’t really understand. They want “search” to be one problem with one little component to drop in to solve it, but it’s just not real.

5 comments

You are absolutely factually correct in your analysis, but you are completely missing how business works.

Fundamentally, there is value to most businesses in being able to just buy a decent solution to a non core competency.

That’s where Algolia and AWS and basically all service companies come in... a medium scale clothing manufacturer with a booming e-commerce site may well know they have no clue how to do search, and no clue how to assess and hire individuals who could implement it, and no clue how to find and hire a cio who could put together a team from scratch who could do this on a reasonable timeline.

In my consulting I switched from rolling out ElasticSearch on Azure, AWS or on-prem to Algolia and couldn't be happier. I want to scope and build products and not be a sysadmin - clients don't want to do any of it, let alone hiring fulltime sysadmins.

I have one client in particular that is a stark indicator of this trend - 50+ year old company and their second floor where they used to have 30+ developers and sysadmins and a server room downstairs has now been remodelled into a break room and new offices for their new team of 5 (all awesome replacing a ton of mediocre people who didn't get much done for a decade)

They're doing better, their products are more popular, they don't have to worry about recruiting developers + sysadmins, their current IT staff get paid better and they're saving money.

I find Algolia interesting in that they've managed to capture something that Elasitc didn't - and it could be because of a prevailing wisdom similar to that of grandfather's comment

What happens if Angolia goes down the tube?
Why would they?

What happens if AWS goes down the tube?

In both cases you complain, hire some people and replace that bit with something else.

> Fundamentally, there is value to most businesses in being able to just buy a decent solution to a non core competency.

Bingo. Exactly that. My core competency isn’t (nor do I want it to be,) implementing and maintaining a search system. Same reason I use Twillio, SendGrid, and Heroku.

I’m saying in my experience there is no such thing as one singular “decent solution” for search. It varies enormously from use case to use case, customer cohort to customer cohort, etc.

To even know if you’re buying a decent solution from Algolia or not, you’d already have to hire pretty much all the same staff you’d have to hire to more cost-effectively build it in-house.

I think the fundamental myth, just like with Rekognition, is that if you ship off your data and the third party trains some model (most likely fine-tuning a base model), then you’re done, problem solved.

Even for businesses where search is not a core part of their direct value proposition to customers this is flagrantly untrue.

I am not trying to be flippant, but maybe you are misunderstanding what Algoia provides?

Let's say I have an eCommerce platform. The search provided by the framework is slow and I want to put together an instant search feature. It's too slow, and I don't have a search specialist to speed it up. So I add the Algoia plugin to the platform, sync my products and the work is done. Literally, real world example, install the plugin and suddenly my archaic eCom platform has instant search. Not only that, but I can manage the search result weigths and preferences from within Algolia with no special experience. My existing search couldn't do that.

I am not sure where you would need an ML specialist in any of this, certainly not a whole team. For most people Algolia out of the box is plenty.

> Not only that, but I can manage the search result weigths and preferences from within Algolia with no special experience.

This was a major selling point for us when we switched to Algolia. The business people want to be able to manage things like search without having to go through the programmers, just like they manage GA/GTM and such.

>To even know if you’re buying a decent solution from Algolia or not, you’d already have to hire pretty much all the same staff you’d have to hire to more cost-effectively build it in-house.

You just have to pay Algolia, wire up the APIs, and then see if whatever stakeholder that was complaining about search stops complaining. If they do then it's good enough.

This is correct. Lets say you have an ecom site with product search. How do you know if the search is good? You search for stuff and compare to what you expect.

Even if it sucks, as long as their is revenue lift, you will keep it until the next solution comes along.

And Algolia has great analytics, so you can actually mesure business value from actual search queries: you can tie a query to a purchase and then run further analysis on that. It’s powerful and doesn’t require any exceptional engineering talent to use.
Totally false. This is like massively overfitting a high-order polynomial regression to your data. The fit looks good enough, then the next data point comes in and breaks in a way the existing model cannot be hacked to account for.

The search results you believed were implicitly tuned to some feedback mechanism slowly experience creep as the customer cohort changes and data distribution changes until before you knew it your management of the search solution is a ceaseless game of whack-a-mole siphoning off engineering resources at a rapidly increasing rate. It’s the same false promise of just having some engineers stand up Elastic Search.

Not all decisions are good decisions, it depends on who makes the call. In most cases, someone ask you to use a solution because it looks/feels better. In this case Algolia showed how fast and how well it could be implemented. Once the person who takes the decision is convinced it will be implemented. It's mostly marketing. Probably less than 1% of all e-commerce websites measures the impact of a decision.
I did say as much in my original post: Algolia and Rekognition are marketed at CTOs and directors of engineering who want to be sold on a magical line item that removes a whole concept area from their concern, especially one associated with the difficulty of hiring and affording good machine learning staff who can work on the problem both pragmatically and theoretically. They want to be sold a story.

I will say though that your 1% claim is way off in my experience (which includes 3 medium and large ecommerce companies). These companies employ armies of product managers and analytics staff that measure the shit out of everything from the color of a button to the size of font in a banner display for a discount promo code. These things aren’t usually measured because they find value, rather just to give the appearance of data driven decision making and justify job perpetuity.

> you’d already have to hire pretty much all the same staff

This is terribly wrong. It's like saying you need as many people to setup Solr/ElasticSearch as you'd need to build a custom search engine.

It requires considerably fewer people to setup, manage, and optimize Algolia than it does to setup, manage, and optimize ES.

Case in point, Twilio.

You’re not even addressing the engineering costs though. The portion of cost of a search engine solution attributable to the set up of Elastic Search is basically zero. The cost is understanding if the search surfaces relevant items for the specific use case, including asymmetric costs for surfacing bad items in many use cases. Not to mention that plug and play third party solutions like Solr / ES are highly inapplicable to a lot of use cases.
I set up algolia for a client while doing some work for them. We needed to add search to a website we were building which searched over data in the client's CMS.

I could have set up elasticsearch. But algolia was cheap and easy to configure, and we could just click around the algolia UI to tweak things like "number of allowed spelling mistakes". We didn't need to proxy anything or set up routes - we just pulled in the algolia JS library to run queries from the application. It was easier for the client to maintain in an ongoing way rather than maintaining their own elasticsearch instance on EC2 or something like that.

I'm sure there are plenty of times you'd want to run your own elasticsearch instance, and I think that would also have been a reasonable choice for us. But I still feel pretty happy with the choice to use algolia.

Arguments to set up your own elasticsearch instance remind me of the criticism against dropbox - "just run your own server with rsync! Its so easy!". Paying someone a small amount of money to do that for me is often a great deal.

> You’re not even addressing the engineering costs though

This is a pretty silly thing to say. How could you possibly know what I'm addressing?

I actually have experience with all of these scenarios, and bar none Algolia has been the fastest, smoothest, bug-free-and-feature-rich-for-the-dollar rollouts for search experiences I have ever come across.

> including asymmetric costs for surfacing bad items in many use cases

Sure, if you're Google, Netflix or Amazon. For the 99.9% of the rest of the world where search isn't core to the business, there's unlikely to be any discernible impact going one way or the other, except saving money and launching faster.

Why do you buy clothes when you could just make them yourself?
A better anology would be to consider a person with very special dietary needs, and then say “why buy dinner when you can make it yourself and actually be sure it meets the dietary needs.”
Sounds like a good case for Algolia Professional Services! (If this exists?)

I remember pushing Google's search appliance for a large media company some 10 years ago or so (no benefit to myself though, which was pretty noob). It made sense at the time, and solved something for them better than they probably would have implemented it themselves in a good enough way. The most complicated part was setting up rules about what was public / private / internal etc.

Except this would be prohibitively expensive. The labor cost of those specialized employees as consultants would be huge.
> you’d already have to hire pretty much all the same staff

I don't know what most Algolia customers are like, but we have been using them for a long time and we are a team of... 2 (only 1 being a technical person).

So hiring a whole team to work on search is certainly not an option for everyone. By paying $29 / month and spending a couple of hours on integration, you get a working search that most small shops will be happy with. That sounds like a valuable proposition to me.

It seems like companies don't have to evaluate it your way? They could informally try it out on a few searches and say, "seems to work better than the other one." This is how Google became popular.

Or maybe do an A/B test. It's a little more formal but doesn't require any search-specific knowledge either.

This approach typically fails quite bad in practice. The subjective impression of success depends on the people around at the time of the decision and the set of queries they chose to inspect. As corner cases pop up with significant cost in production (e.g. surfacing nudity in an image search for a query where it’s highly inappropriate) you become less and less capable of understanding why or hacking business logic in to an already brittle system.

The same problem appears for A/B testing different result orderings. It all hinges on the metrics you chose. If you only looked at e.g. top 5 click through rate, and then later a page redesign brings the top 10 results above the fold, and 6-10 are garbage, you’re suddenly screwed with no systematic way to adjust underlying parameters of the search model that control these things, or really even to get analytics data about it because you felt you could just outsource to a place like Algolia and not have in-house expertise, or that some engineers without statistics backgrounds could just hack it.

What do you do when you don’t have the budget for a team of in-house experts and you need something that is good enough and you need it now.
Well, what you don’t do is rush to buy a wrong thing because you’re desperate and it has good marketing.
Disclaimer: I'm a software engineer working on the core search engine at algolia, but the opinions of this post are my own and not an official statement.

Search is a hard job, and it's hard in many ways. The most obvious difficulties are related to relevance, and yes this part is specific to each business case. But that's not the only issues one has to solve when implementing search.

Even without speaking about the software you run, running it so that you have high availability, fast search results, fast enough to provide search as you type, reliable indexing, low latency in several regions ... This is the first service we provided, this is what SaaS is about. Being on inside of a SaaS compagnie, shows you the amount of works we save to our customers.

Then, about software solution itself. Providing search is not just about running generic piece of code. It's a whole eco system, continuously evolving. Working with a SaaS solution is hiring a team of more than hundred engineers dedicated to search. From the core software to frontend UI modules, the amount of engineering needed is way above what most companies can dedicate to search.

Back to relevance, some aspects are specific to business logic but some are also specific to search. We provide the search knowledge, so that you can focus on your own issues.

And for the software behind our services, we're not trying to build the one size fits all search tool, but a tool dedicated to the kind of search needed in today's web applications. I'm obviously biased, but I strongly believe that the kind of search we focus on, fast accurate top results rather than exhaustive search, fits terribly well our users' needs.

I do both search and ML solutions in the area that Rekognition targets.

In both cases they are great 80/20 solutions (actually Algoria is more like a 95/5 solution in most cases).

I also do computer vision and search in the areas targeted by Algolia and Rekognition and have not found this to be true at all. For face detection for example, Rekognition was completely unusable for my company.
I haven't used face detection by Rekognition so I can't comment.

However, I'm surprised it's that bad.

I've done two projects in the last 6 months that required face detection, and in both cases combinations of DLib and OpenCV performed perfectly well. Since these are entirely off-the-self models I don't see why Rekognition should - in principle - be any worse.

Yeah, pre-built dlib and opencv models are similarly not realistic for real world applications. We ended up needing to train our own version of MTCNN and separately train celebrity face recognition.

Especially when detecting in images with many faces, these legacy off the shelf things built on Viola-Jones type models or HoG feature extractors are just not acceptable by comparison with deep learning models.

And even at that, you need to fine tune the model to your own specific dataset with appropriate weights to reflect asymmetry in false positives vs false negatives. Simply using any off the shelf model, even a deep CNN model, virtually never works in practice. Unless your real life task is well approximated by the academic data set used for training (and it never is), you’re going to need a computer vision engineer involved.

> I’ve always had a hard time understanding the value proposition in the same way I don’t understand the value proposition of e.g. AWS Rekognition.

I signed up for AWS specifically to use Rekognition. I use it to screen alerts from my security cameras. In short, Blue Iris detects motion, a Node-RED flow grabs an image and uses Rekognition to see what's in it, if there's a person detected the Node-RED flow notifies me via PushOver. This significantly reduces the false-positives that inevitably happen on windy days - I've already done a lot of work in Blue Iris on this, but passing alerts through Rekognition makes it almost perfect. Based on my testing, this reduces false-positives to zero and hasn't yet produced a false-negative.

Based on my usage I expect my costs to be ~$5/mo once I'm no longer in the free tier. This is cheaper than the person detection service that Blue Iris natively integrates with and is significantly less effort to get up and running compared to, for example, TensorFlow. I also assume that Amazon will periodically update their detection models to make it better, which is one less thing for me to worry about.

For me, all of these benefits are worth the ~$5/mo.

> “Based on my testing, this reduces false-positives to zero and hasn't yet produced a false-negative.”

But you’re just proving my point. It wouldn’t make sense to use Rekognition unless you had someone with skills to assess the classifier accuracy in the context of your specific problem. For example, it seems like your loss function places an asymmetrically higher cost on false negatives. (Incidentally, it’s interesting you claim it hasn’t produced a false negative ... did you watch every frame of video and make sure?)

If you replace your simple one man operation with a simple loss function on an amount of data you can manually evaluate with instead a complex computer vision workflow, say where face or person detection has legal consequences for a company that sells or licenses stock photography, or an image or video search tool trying to avoid surfacing porn or pirated content, etc. then Rekognition becomes no longer useful, because you’ll need not just one person doing cursory evaluation of false negatives, but a team of people building out a benchmark-like battery of automated evaluations with probably IoU metrics in addition to classifier metrics and will need to figure out how many errors they can tolerate in some cost budget combined with the normal cost budget of usage to Rekognition.

Basically, for some tiny hobbyist use case, I guess it’s fine (though really you could literally just load some Keras model pre-trained on imagenet or some off the shelf version of yolo and save yourself $5/mo) but the value proposition falls apart as soon as the cost function becomes a complicated business one.

Another big difference here is that Algolia does not use machine learning in its algorithms.

This according to an old friend who worked there allowed them to really drill down to why which search results are shown and hence the pay per use does actually make sense.

This sounds like comedy to me. Either people say machine learning is magic and solves everything or people say not to use machine learning at all and this lets them drill down and understand?

Many search tasks really do need machine learning, especially variations on collaborative filter and matrix factorization. Mixed modality search often truly does need deep learning and wasn’t even really possible at a level of fidelity suitable for real use cases until maybe 10 years ago.

If Algolia was categorically omitting a whole class of possible solutions, that would be a big red flag, certainly not a reason to think they can drill down to understand search results better.

I worked once on a large ecommerce search engine that had been built with Solr, and the sort order involved crazy hand-tuned boosting scores applied to ngrams of different sizes. None of it was reproducible, nobody knew where the magic boost weights came from, and as the quality of results started to plummet, there was no way to fix it. Everyone was too afraid to modify the magic constants because even slight perturbations created stark visual errors. And this was just for a super simple non-normalized term frequency matrix with boosts. “Not using machine learning” is not at all a signal that your solution won’t end up as a black box with no interpretability.

> And there’s no serious way to understand the accuracy you get per use (on your specific unusual distribution of queries) without employing the expensive ML / stats engineers you probably thought you could avoid hiring by outsourcing to Algolia / Rekognition in the first place.

You may simply not be able to do this at all. You might not know how to tell good ML/Stats people from bad. You might not be able to pay them competitively.

You might simply want something that's better than nothing, with "nothing" being your realistic alternative. "Expert in-house ML team" is not an alternative many companies can get, and even for the ones that could, it'll take a while. What are you going to do in the meantime?

At our company we have an "Expert in house ML team", but I don't want them wasting there time with search since it isn't our differentiating factor.
If you are building a core product, such as Spotify's Discover Weekly. Then no question, you need a dedicated team, and substantial commitment. Maybe even a PhD or two.

But as an unabashed Angolia devotee, I think the value prop of InstantSearch is a no brainer. It's worthwhile looking at the product itself, as an almost textbook example of how to package services to enterprise customers.

https://www.algolia.com/products/instantsearch/

https://www.algolia.com/enterprise/customers/