| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jillesvangurp 2388 days ago

I recently worked on a project that involved reactive search. A few things you should be aware off when using that:

- client side query construction basically puts you in a position where you are going to be thinking about opening up elasticsearch to the internet. There have been a range of companies that have accidentally revealed quite a bit more than they wanted. Unless you know what you are doing and how to prevent bad things from happening, just don't even think about doing this. ES has lots of features that can cause it to run out of memory, execute arbitrary code (ranking scripts), etc. A lot of that has sane defaults of course but exposing that to the internet gives a malicious person a lot of power. Just don't even consider using this without a proxy that sanitizes whatever comes from the browser.

- it's extremely aggressive in sending lots of queries to elasticsearch and a lot of these queries involve aggregations. That's great if you don't have a lot of data and users but it's a performance problem waiting to happen. It basically does an msearch (multiple searches in one request) of queries that quite trivially could be reduced to just one query. It does that when you type a single letter or interact with any of the react components. It's basically a react DOS toolkit.

- the queries it constructs are a bit convoluted and under-use such features as filters which you'd use to speed things up (allows ES to utilize its query cache). It's nice if you don't know much about ES but at some point you will want to sit down and optimize and it just gets in the way of doing that.

I ended up implementing a proxy (in python) that basically completely rewrites the query the client sends server side. That solves the security problem too because there is just no way to trick the server into running whatever query. Then I just completely faked the client side query just so I could get extract the handful of parameters. The end result allowed me to optimize the query, decouple the UI from server side logic (i.e. we can change the query without touching the UI now), add some much needed integration and ranking tests, etc.

So, if you are considering reactive search, I recommend not using it or at the very least doing something like I did (which is work around it as much as you can). And if you are doing that, you might as well design a proper REST API and some components that manipulate a query context instead of triggering dozens of queries on every DOM event.

1 comments

sidi 2387 days ago

Hey, addressing some of the concerns raised here directly:

> - client side query construction basically puts you in a position ..

We recommend using a proxy server for these cases. It's on our roadmap to add first-class support for search templates which would completely prevent query generation to happen client-side.

> - it's extremely aggressive in sending lots of queries to elasticsearch ..

This is configurable by using the debounce prop. And the number of queries depend on what components you would like to see updated based on a change in the search query or a facet value, for example.

> - the queries it constructs are a bit convoluted and under-use such features as filters ..

We would be happy to address specific scenarios if you're seeing, please raise an issue. That said, I believe we already do this to the extent it's generalizable. And since as a user, you can change the underlying query - it should be addressable by users.

---

To sum up, the DOS and querying part is completely configurable and just comes down to usage. We would love to address any generalizable issues.

There are merits to your point about client-side query generation. They are already addressed for users who are using: 1. Appbase.io - as you can set ACLs and rules (e.g. only allow search requests, max X requests per IP per hour, set max size of Y), 2. For users outside of appbase.io, we recommend using a proxy server like you did to implement an authorization logic.

Once we have first-class support for search templates in ReactiveSearch, this should be effectively addressed.

jillesvangurp 2387 days ago

Debounce helps of course. I'm just responding to what I saw at my customer which was dozens of queries happening for simple user actions. Also the requests bloat quite a bit with all the query composition magic client side. This is probably not very mobile friendly. Also raw es responses are a bit heavy.

Templating is a good solution. I'd recommend making that the only way. Also, I don't see a good reason for using msearch since you can simply do the search once with multiple aggregations (and only for the first page of results).

A challenge with this client is that it is intended for people who are likely to not have a full grasp of the Elastic stack and thus very likely to get themselves in all sorts of trouble. That's exactly what I saw when I came in with my customer. They'd done the minimum of work to get started. Thankfully they had the good sense to setup a proxy but there must be loads of people who just open up port 9200.

I'd recommend actually testing the connection for access to things that definitely should be off limits (e.g. create and delete an index, if that works stert screaming at ERROR level).

For scaling and for architectural sanity, I don't think UIs should query directly. SQL support in react is also not a thing for the same reason. You typically do that kind of thing server side.