| Coming from a Solr/Lucene/Algolia background, my opinions on this: What's good: ========== - Focused search for question and answer databases (such as customer FAQs) - ML-based semantic search without requiring any explicit configuration - Connectors for S3, AWS-hosted MySQL/PG, Sharepoint.
Searching data already in the AWS ecosystem (S3, Aurora) is now easier,
and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth - Document-level access control at all pricing plans - Managed search (similar to Algolia) What's similar to existing search systems (Solr / ES / Algolia): ========== - Indexing: All data has to be processed into "field:value" structure prior to indexing - Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT - Searching: Usual boolean filters and faceting but only at field level. - Searching: Field and value boosts for relevance, but only at index-time - Results: Highlighting support What's missing: =========== - No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or
I've missed out something in their docs) - Can't configure text analysis for English. I feel this'll return relevant results for formal-style
content, but probably not for informal-style content like emails. - No connectors for common internal systems: Outlook, JIRA, Confluence - No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs. - Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially
for numerical data. - No query-time relevance tuning - No field-level access control - Scores are not returned in results - Common post-searching functionality is missing: rescoring, grouping, clustering What's unknown: ============ - I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow? - All search systems fall short while handling proper nouns - names, places, things, scientific names.
It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra
does it in its ML backend. |