| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ermouth 2950 days ago

> Nothing is streaming

Except CouchDB entire REST API, where nearly every endpoint streams result row by row, which is great.

> Performance (and likely security, due to age of the engine) in this subsystem is extremely poor

As of security: wrong guess. Go compare number of critical CouchDB CVE for last, say, 5 years, with any other DB you want.

As for query server: indexing is slow for the reasons you pointed out. Requesting persisted index is ok. Taking in account CouchDB persists everything, it‘s not in-memory DB (which is again great), I‘d say it‘s even fast.

> Performance in this (Erlang) area also appeared to be proportional to the code size of the map function

Performance in this area depends much more on how branchy is the json doc being processed. It‘s not about CouchDB itself, it‘s about json parser used. There might be a situation when you have better perf using JS views.

> Nothing about this is relaxing.

Indeed. But then it just works, for years, with zero maintenance even for replications – the result I‘ve never even nearly achieved with any other DB I used.

1 comments

facetube 2950 days ago

> Except CouchDB entire REST API, where nearly every endpoint streams result row by row

I'm talking about the internals of couchjs.

> As of security: wrong guess. Go compare number of critical CouchDB CVE

I'm talking about Spidermonkey 1.8.5, which is seven years old. Multiple CVEs have been reported including a full RCE, which was patched in later versions. If you believe these are patched in libmozjs185, I'd love to see a link to the source package showing where the fix was applied. Because I can't find it.

Further: https://www.cvedetails.com/vulnerability-list/vendor_id-45/p.... Another full RCE in CouchDB.

> Performance in this area depends much more on how branchy is the json doc being processed

Citation needed. There's an issue specifically about this in the CouchDB issue tracker if you'd like to read more.

> As for query server: indexing is slow for the reasons you pointed out.

Right. It is not performant. That's my point.

Curious what kind of database sizes you're working with too; you won't see this stuff with a few hundred documents.

link

ermouth 2949 days ago

> I'm talking about the internals of couchjs.

_list functions stream, and was specifically designed to be able to stream. However, streaming does not help much in this case, moreover too small chunks dramatically reduce already awful _list perf.

Taking this in account I see no value for views to stream. Sending all emitted KVs for the doc to Erlang bits in one turn seems both much more predictable and safe.

> Multiple CVEs have been reported including a full RCE, which was patched in later versions

Last CVEs has nothing to do with Spidermonkey. BTW they may be patched without upgrade, with 5LOC long design document in Erlang.

The main reason of last CVEs is inconsistency in parsing improper JSONs by Erlang parser. Namely, most JSON parsers process '{"abc":1, "abc":2}' as {abc:2}, but old jiffy parses it as {abc:1}. BTW severe inconsistencies in parsing JSON are pretty common across implementations, please read http://seriot.ch/parsing_json.php for more details.

> Citation needed. There's an issue specifically about this in the CouchDB issue tracker if you'd like to read more.

I‘ll give you no cite, sorry, because we discovered the effect during internal tests. Reason is simple: accessing values in deep branchy JSONs is generally faster in JS, because it‘s native format. JSON in Erlang is a monster like {[{<<"abc">>, 1}]} for {abc:1}, which, when has a lot of levels and nodes at each level, performs bad for selecting random node.

> Curious what kind of database sizes you're working with too

We sometimes measure number of docs with M postfix ) Not very often, however. In my humble opinion, if you plan to have CouchDB with, say, 100M docs in a single bucket, you probably chose wrong solution.

BTW, same for large kitchen DBs for buckets with, say, 10K docs.

link

facetube 2949 days ago

> I‘ll give you no cite, sorry, because we discovered the effect during internal tests

We discovered the issue in internal tests and reported it upstream where it was confirmed; there's nothing to discuss here.

> BTW they may be patched without upgrade, with 5LOC long design document in Erlang.

I don't have to hand-patch other database systems. Further, as of now there are no functioning packages for multiple versions of Ubuntu. Multiple competitors do not have any of these problems.

> Last CVEs has nothing to do with Spidermonkey

It was a full RCE that didn't have vendor packages ready in time.

Sorry, but using seven year old language runtimes is daft. It might be fine for you, but it's not appropriate in environments where you care about security and performance, or care about the overhead of making your team reason about these things unnecessarily.

> We sometimes measure number of docs with M postfix

Yeah, we did this in 2003 on commodity hardware, and even it built indexes faster than CouchDB builds map/reduce indexes. Fix the external view server protocol or be honest with people and kill it off – the status quo is unacceptable.

Finally, here's a five year old issue that's still open admitting what I just explained: https://issues.apache.org/jira/browse/COUCHDB-1743

link

ermouth 2949 days ago

100M docs? Of json format? On commodity hardware? In 2003?

Hahaha, hello long waited friend from parallel universe!

link

facetube 2948 days ago

Easy enough to convert on the way out, using a mature database system without these performance flaws. The app got done and performed well, and we weren't at the mercy of an unresponsive community that leaves seven year old dependencies in critical paths.

Keep going; it'll bite you eventually. Don't say you weren't warned.

link

facetube 2950 days ago

Correction: the library is too old for that specific CVE to be exploitable. There may be others, there may not be. I don't know – and that's actually the problem.

The chief complaint here is that it's very hard to reason about the security of seven year old software that distros are pulling out of their package repositories citing security issues (e.g. Alpine Linux). Advisories tend to say "versions before X", and you're left to either wonder or read tickets/code. Not everyone wants to sign on for that, and it's work that isn't going toward writing your app.

link