| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lovelearning 2431 days ago

Coming from a Solr/Lucene/Algolia background, my opinions on this:

What's good:

==========

- Focused search for question and answer databases (such as customer FAQs)

- ML-based semantic search without requiring any explicit configuration

- Connectors for S3, AWS-hosted MySQL/PG, Sharepoint. Searching data already in the AWS ecosystem (S3, Aurora) is now easier, and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth

- Document-level access control at all pricing plans

- Managed search (similar to Algolia)

What's similar to existing search systems (Solr / ES / Algolia):

==========

- Indexing: All data has to be processed into "field:value" structure prior to indexing

- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT

- Searching: Usual boolean filters and faceting but only at field level.

- Searching: Field and value boosts for relevance, but only at index-time

- Results: Highlighting support

What's missing:

===========

- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or I've missed out something in their docs)

- Can't configure text analysis for English. I feel this'll return relevant results for formal-style content, but probably not for informal-style content like emails.

- No connectors for common internal systems: Outlook, JIRA, Confluence

- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.

- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially for numerical data.

- No query-time relevance tuning

- No field-level access control

- Scores are not returned in results

- Common post-searching functionality is missing: rescoring, grouping, clustering

What's unknown:

============

- I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow?

- All search systems fall short while handling proper nouns - names, places, things, scientific names. It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra does it in its ML backend.