| > Except CouchDB entire REST API, where nearly every endpoint streams result row by row I'm talking about the internals of couchjs. > As of security: wrong guess. Go compare number of critical CouchDB CVE I'm talking about Spidermonkey 1.8.5, which is seven years old. Multiple CVEs have been reported including a full RCE, which was patched in later versions. If you believe these are patched in libmozjs185, I'd love to see a link to the source package showing where the fix was applied. Because I can't find it. Further: https://www.cvedetails.com/vulnerability-list/vendor_id-45/p.... Another full RCE in CouchDB. > Performance in this area depends much more on how branchy is the json doc being processed Citation needed. There's an issue specifically about this in the CouchDB issue tracker if you'd like to read more. > As for query server: indexing is slow for the reasons you pointed out. Right. It is not performant. That's my point. Curious what kind of database sizes you're working with too; you won't see this stuff with a few hundred documents. |
_list functions stream, and was specifically designed to be able to stream. However, streaming does not help much in this case, moreover too small chunks dramatically reduce already awful _list perf.
Taking this in account I see no value for views to stream. Sending all emitted KVs for the doc to Erlang bits in one turn seems both much more predictable and safe.
> Multiple CVEs have been reported including a full RCE, which was patched in later versions
Last CVEs has nothing to do with Spidermonkey. BTW they may be patched without upgrade, with 5LOC long design document in Erlang.
The main reason of last CVEs is inconsistency in parsing improper JSONs by Erlang parser. Namely, most JSON parsers process '{"abc":1, "abc":2}' as {abc:2}, but old jiffy parses it as {abc:1}. BTW severe inconsistencies in parsing JSON are pretty common across implementations, please read http://seriot.ch/parsing_json.php for more details.
> Citation needed. There's an issue specifically about this in the CouchDB issue tracker if you'd like to read more.
I‘ll give you no cite, sorry, because we discovered the effect during internal tests. Reason is simple: accessing values in deep branchy JSONs is generally faster in JS, because it‘s native format. JSON in Erlang is a monster like {[{<<"abc">>, 1}]} for {abc:1}, which, when has a lot of levels and nodes at each level, performs bad for selecting random node.
> Curious what kind of database sizes you're working with too
We sometimes measure number of docs with M postfix ) Not very often, however. In my humble opinion, if you plan to have CouchDB with, say, 100M docs in a single bucket, you probably chose wrong solution.
BTW, same for large kitchen DBs for buckets with, say, 10K docs.