Hacker News new | ask | show | jobs
by evanweaver 1897 days ago
As others have pointed out, measuring latency from an AWS Lambda function to a co-located single node in-memory non-durable key-value database (Redis), or to a co-located single AZ eventually consistent key-value database (DynamoDB), doesn't have anything to do with measuring client-observed latency from the browser to a globally distributed ACID-compliant document database.

A similar process co-located with a Fauna region normally can also perform simple reads in small numbers of milliseconds.

Similarly, a browser client querying a lambda function multiple times from the other side of the world will also be quite slow, even if the lambda "thinks" its queries are fast because its database is right next to it.

It is not completely clear to me what else is going wrong, but the basic premise of the benchmark is invalid, and there are other errors in regard to Fauna. For example, index serializability only affects write throughput; it has nothing to do with read latency. The status page reports write latency, not read latency, etc.

For a more even-handed comparison of Fauna to DynamoDB, see this blog post: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...

7 comments

I think we figured out at least one issue here.

Fauna is a temporal database, not a time series database. The code in the test that updates the score on every post after every read is creating new historical events every time it does that. These have to be read and skipped over during the read query which will continually increase latency proportional to the number of updates that have occurred.

By default, Fauna retains this data and makes it queryable (with transactional guarantees) for 30 days, unlike DynamoDB or Redis. Reducing the retention period would help a bit, but event garbage collection is not immediate so there will still be differences for heavily churned documents. Normally, having a few updates to a document or an index has no noticeable impact but in this case it appears to be swamping the other factors in the latency profile.

It is possible to manually remove the previous events in the update query; doing that should reduce the latency. Nevertheless, Fauna is not a time series database so this is a bit of an anti-pattern.

I have commented this section and redeployed the code.

See the Fauna endpoint: https://71q1jyiise.execute-api.us-west-1.amazonaws.com/dev/f...

See the code: https://github.com/upstash/latency-comparison/blob/master/ne...

Did you delete all the extra events that have been created already?
If you mean histogram, yes I reset the histogram for Fauna.

If you mean deleting Faunadb internal events, I do not know how to do. Can you guide me?

AWS DynamoDB is multi-AZ and is strongly consistent (at least for a single key update).

I believe you confused it with a Dynamo DB described in a paper Amazon published long ago. AWS DynamoDB has nothing to do with a eventually consistent design the paper describes. It was purely marketing gimmick to call the new AWS Service that.

Good article, thanks.

I'm currently reading The DynamoDB Book, and even it's author acknowledges that DynamoDB is serverless more as a byproduct and shouldn't be used for that reason alone

But as a product developer all these highly integrated AWS services that can be provisioned with CloudFormation are pretty convincing.

What's Fauna's IaC story?

I will be happy if someone from Fauna team helps me to improve my code. https://github.com/upstash/latency-comparison

Upstash is not non-durable. It is based on multitier storage (memory + EBS) implementing Redis API. In a few weeks, I will add upstash premium which replicates data to multiple zones, to the benchmark app.

In the blog post, I mentioned the qualities where Fauna is stronger than the others: https://blog.upstash.com/latency-comparison#why-is-faunadb-s...

Your own docs say that by default “writes are not guaranteed to be durable even if the client receives a success response”.
Upstash has two consistency modes. Eventual consistency and Strong consistency. Please see: https://docs.upstash.com/overall/consistency

In my code, Upstash database was eventually consistent. Similarly the index in the FaunaDB was not serialized.

But both of those should not affect the latency numbers in the histogram because those numbers are all read latency.

That's an apples to oranges comparison though. Upstash couples durability with consistency/isolation. Regardless of configuration, FaunaDB and DynamoDB both always ensure durability of acknowledged writes with a fault tolerance of greater than 1 node failure. To compare them on equal footing, Upstash would need to be configured for strong consistency, at least according to the docs.
DynamoDB also guarantees your write is distributed to multiple data centers as well.
No it doesn't. The write is not durable in other datacenters when the client acknowledgement is returned. It is still possible to lose data.
While your points are valid, that you're CTO of Fauna is relevant information.
not really. these are all facts, not opinions.
> For a more even-handed comparison of Fauna to DynamoDB, see this blog post: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...

An even handed comparison from the authors of FaunaDB?

I found that comparison to be technically objective. Did you find some bias on it?
“Writes always go through a leader replica first; reads can come from any replica in eventually-consistent mode, or the leader replica in strongly consistent mode.”

This part isn’t correct. The two follower replicas can serve a consistent read even if the leader is behind / down. And there is no guarantee the primary even has the data persisted to disk when the subsequent read call is made.

If people really want to know how DynamoDB works, this is a good tech talk: https://www.youtube.com/watch?v=yvBR71D0nAQ