Hacker News new | ask | show | jobs
by toast0 3192 days ago
If someone was familiar with Vespa in 2011, but hasn't had access to it until now, what's new since then?
2 comments

At Flickr, we worked closely with the Vespa team from 2011 through 2016 on a wide range of advancements:

   * partial document refeeding (i.e. expedite indexing a new field to 20+ billion documents without refeeding everything and staying online handling 100M+ free text queries a day)
   * visual similarity search - check out the tensor ranking features [1] [2]
   * online elasticity - add/remove replicas / shards online. A must when it could take weeks+ to re-feed from scratch. This is non-trivial to make work smoothly at scale. 
   * latency / tail-latency on complex queries. p90 reduction from 3,000 to 30 ms.
This is a major gift to the open-source community of a battle-tested search engine that works reliably without babysitting with very large datasets, and simultaneous high query / high feed volumes. Huge debt of gratitude to the team in Trondheim and Verizon/Oath/Yahoo legal & management teams for making this happen. :+1:

[1] http://docs.vespa.ai/documentation/tensor-intro.html [2] http://docs.vespa.ai/documentation/tensor-user-guide.html

Not precisely sure where we were in 2011, but I think these are the biggest ones that came after, off the top of my head (i.e sure to be missing something):

  - Merging content and index clusters to one to make index clusters elastic and auto-recovering on data loss.
  - Fully realtime writes.
  - Support more advanced machine-learned ranking through tensors.
  - Streaming (personal) search supporting a large write rate.
  - Document references.
  - WAND and RANK operators.
  - Rank features over multivalue text fields.
  - Predicate fields.
  - Lots and lots of performance work.