| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jmakeig 4215 days ago

[Full disclosure: I’m a product manager at MarkLogic.]

MarkLogic can handle all of these requirements with aplomb. You can think of MarkLogic as a database built with search engine technology. It uses a document data model (text documents in XML or JSON). Each term (word, phrase, parent-child relationship, etc.) is indexed on ingest. There are index knobs and levers for things like diacritics, wildcards, and scalars, like you'd expect in real search engine.

As for document permissions, they're indexed just like other terms. However, they’re automatically ANDed on to each query in the database engine, not application code. MarkLogic supports role-based permissions (read, write, and execute for stored procedures) with optional Kerberos and/or LDAP auth*n.“Ignored/hidden items” are those that a user doesn’t have permissions to access.

"Followed/watched items" is a pretty common requirement. MarkLogic uses a special "reverse index" to index queries along with text, values, and structures. With regular "forward" queries, queries find documents. With reverse queries, documents find queries. Thus anything that can be expressed in a query can be turned into an alert. This provides some pretty powerful match-making where a document can express its own attributes as well as those it’s interested in matching. Hook that up to a trigger (pre- or post-commit) and you have alerting that scales to billions of documents and millions of queries. One of the world’s biggest news sites uses this infrastructure on a MarkLogic cluster to handle saved searches and alerts.