Hacker News new | ask | show | jobs
by abeppu 1736 days ago
I'll take a stab at actually guessing why aside from the issue that people making purchasing decisions don't see how bad it is until work has already gone into bringing in docs and pushing people to use it.

Aside from the organizational issues, I think there's a problem where basically no search system can be good for every org with any kind of internal info and different queries from perhaps several distinct types of users with different goals. To get good, a system needs to improve through at least rudimentary ML. At its simplest, if Alice searches for X today and clicks doc3, if Bob searches for X tomorrow, doc3 should rank higher. This requires collecting and aggregating click stream data, and using this count info (with cardinality #docs x #queries) at search time. But sometimes it requires a richer model relating search terms to terms in relevant (clicked) docs and optimizing for some measure of search quality (NDCG) etc. All of this requires detailed access to docs, search/click histories, and a fair amount of computation and storage. But customers have legit reasons for wanting these docs to only be accessible by their own employees. And they don't want to dedicate their own staff to improving such a system. No one wants to hear that their model retaining ran out of memory, etc. So shipping a simple system which doesn't improve but doesn't have moving parts becomes a local optima.

1 comments

Great take. I worked on Confluence for a few years and have a bit of insight.

Search has been an area of focus on and off for the most part of the last 15 years. It actually has gotten a lot better and Atlassian has an entire team focused on improving the search experience across their suite of products (they started with Confluence). And from what I hear, they are focusing on all the right things.

To your point, no search system can be a good fit for every possible use-case. Confluence has a number of different use-cases, but let's just pick "documentation" and "intranet" as an example here.

Intranets are, to a large degree, about keeping up with what's new in a company. Therefore recent content is likely more relevant than older content.

When used for documentation, recency doesn't matter at all. If a document was written 2 years ago, but the content is still accurate, it's just as relevant as it was on day one.

That means no single relevance configuration will work well for all use-cases. Leveraging ML is essential. But even a single ML model across an entire Confluence instance is not going to work as different spaces are used for different use-cases. What's really required here is to build different models for different spaces to create a tailored relevancy for each space. It's not an easy problem to solve, but I'm confident they will get there with time.

Seeing the challenges with Search at Atlassian, despite having a large, dedicated team of engineers working on the problem, is what motivated me to join http://sajari.com. We've been doing a lot of work on reinforcement learning and Neural Search. Our focus right now is on public content websites and e-commerce, but eventually we will get around to enable products like Confluence to create a great search experience without the need for an entire team. Search is a hard problem, but there is so much opportunity to improve the experiences that are available today. Exciting times.

It may be depressing to be on the Confluence “search” team. But wouldn’t it be MORE depressing to be on their “editing” team? Or how about their “performance” team? :-(
The comment you responded to said nothing about being on the search team being depressing.