Hacker News new | ask | show | jobs
by watty 2862 days ago
Looks great for basic websites but it's missing the biggest and most difficult piece of cloud infrastructure. The DATABASE!

Today you'd have to open up your cloud DB provider to the world since Zeit can't provide a list of IPs to whitelist. This is a showstopper for me unfortunately.

6 comments

From what I've seen, when people talk about serverless there's 2 camps.

Functions makes life easier camp. Serverless development and deployments have nicer properties which make them easier to reason about and eliminate entire classes of errors.

Functions make edge computing possible camp. Serverless functions can be deployed in datacenters around the globe close to your users, offloading compute from your core and improving latency.

Now let's talk about DB semantics. You and many other people in the first camp probably want your business logic on strongly-consistent SQL transactions. That's good, it's the right semantics for the job. But it's incompatible with the edge model where the functions are decoupled from the central datastore.

So I think that you're asking for something the community isn't mature enough to provide yet. The momentum is towards unification where we need stratification (with respect to coupling).

You can't decouple validation from the datastore if you want to be certain that all the validation happens and none is ever bypassed or outdated.

It bugs me how many systems have opt-in correctness.

If you start with the intention of asynchronous, non-transactional eventual consistency as your assumptive design model, you'll realise:

1. Most applications don't need 100% correctness. Few people are writing nuclear reactor control systems, or bank account management software

2. This model frees you to up to compute at the edge more.

Is there a chance that you consider something valid that eventually wasn't? Yes. But in a single actor-per interaction system (e.g. an single actor mutating something, multiple people seeing effects, etc.), typical to the web, this is OK.

People know about CAP theorem but hang onto the C with dear life. Let it go. It makes everything easier. Honest.

You make some good points, but I disagree that giving up consistency makes everything easier. There are trade-offs. In an eventually consistent system, in my experience, application code becomes much more complex to implement, test and debug.
Firestore has strong consistency transactions.
Most databases are quite unfit for the serverless world that's becoming a reality, where the needs shift towards global replication, flexible horizontal scalability (sharding) and vertical (provisioned QPS).

We like and use CosmosDB because it fits this criteria. We anticipate that Google Spanner, CockroachDB and similar databases will become the go-tos in combination with ZEIT Now.

So what do most of your current customers do for data storage? I mean, I doubt they all use CosmosDB? (simply because it's not particularly mainstream)
We don't have insight on what our customer's use mostly.

We use existing technologies. Anyone can use any cloud Database service. We've datacenters on San Fransisco and in Belgium. So, based on those users can choose where they need to deploy their DBs.

Usually we recommend to configure databases via env variables. Users can also use our [now secrets](https://zeit.co/docs/getting-started/secrets) service as well to avoid hard-coding secrets.

You are going find most people still are using Master-Slave databases at some central datacenter.

Spanned databases are great, but most of the time performance is not there (It's getting there, but it needs to be there for a year before people start to care)

Frankly I blame the SQL DBs for the rise of NoSQL. They didn't move fast enough for this kind of environment, and stuff like Cassandra fit that need pretty well.
That's an divisive statement. I'd blame people who were unwilling to invest the time in properly modelling their data on the rise of NoSQL.

Transactional consistency and data normalisation - pffft.

SQL is still doing very well running things behind the scenes.

The problem is not data modelling. The problem is ensuring eventual consistency and synchronization of data in multiple datacenters around the world.
Talking about running things behind the scenes, mainframes with pre-SQL NoSQL DBs (ADABAS, IBM IMS) and even with no DBMS in a modern sense at all (running on TPF, working directly with Direct Access Storage Device records) are still doing very well.
And fauna
At Cloudflare, we're working on expanding Workers (https://www.cloudflare.com/products/cloudflare-workers/) to allow access to your existing DB servers & offer protection with Argo Tunnel (https://www.cloudflare.com/products/argo-tunnel/). We are also enabling Workers to write into Cloudflare’s globally distributed cache, reducing retrieval time for repeated query results. We hope this will be a differentiator with using Cloudflare & highly valuable for your use cases.
Can your workers run Docker containers?
Workers don't run Docker containers intentionally. The goal with Workers is to run with a lower memory overhead (~3 MB) and lower startup time (~5 ms) than you can get with full container isolation. This allows your Worker to run affordably in 150+ locations around the world. In many ways it's the ultimate destination for serverless, running code in a multitenant process where all you manage is your code.
Fair enough. But that's not really the same as what Zeit is providing here.
We currently don't support this, but it may be something we consider in the future.
Amazon is building an HTTP interface to Serverless Aurora to solve this problem. You can secure it via IAM rather than network segmentation, much like DynamoDB.
Sounds like what you want are Datomic Ions
If you connect to your database over TLS (maybe with an extra client certificate or something), I don't see much of a problem.
As far as protocol is concerned, if you're using TLS, a client certificate, and a strong password, sure, opening your database servers to world accessible should be fine.

The problem is that it's possible, and very likely, there are exploits in the wild for your database server -- that are known but you failed to update for a day, or are 0 day exploits -- which are exploitable without having an authenticated account. Those issues can't be exploited if you firewall your database server to known IPs, but once you make it world accessible, all bets are off.

You may be interested to know that Heroku exposes all PostgreSQL databases publically, and unless you are an enterprise customer there is no way to turn that ‘feature’ off:

https://devcenter.heroku.com/articles/connecting-to-heroku-p...

Not to worry, I always set a password on the 'postgres' user. Something like: 'postgres'.
How would that exploit get exploited if you have mutual TLS authentication protecting your database connection? Or are you saying it's likely that particular piece of the database infrastructure is likely to have an exploit?
Just using SSL connections doesn't cover it, for two reasons:

1. You can't force MySQL to only accept connections over SSL. You can only enable SSL, and set specific accounts to only allow SSL logins. This means that any sort of "unauthenticated attack" on MySQL will work -- if you can exploit MySQL without a valid login, enabling SSL for users won't help you.

2. Amazon RDS supports using SSL connections, and will issue your MySQL server an SSL cert from their certificate authority, so your client can validate the server. It does not, however, support client SSL certificates, for the server to validate the client. Which means the only thing SSL connection is doing for you is encrypting the connection -- it's not in any way validating the client, and anyone can download the RDS region's CA certificate and then connect/exploit your MySQL connection normally.

Yeah, if the database doesn’t support mutual tls, it is clear that wouldn’t be sufficient protection. Having a proxy in between client and server that handles this (e.g. envoy) would be a good option.
Make a VPN.
The problem with just using something like Lambda on a VPC (so you can use a traditional firewall to protect your MySQL server) is that the cold start times are 10-15 seconds, to get your local network interface and connect to the server.

You're suggesting that from the time a user makes a request to the function, the function should load up, and then create a client VPN connection to the database server, and then create a Database connection? Pretty sure that'd end up pushing towards the same cold start time. Also, no managed database provider is going to offer a VPN connection to it, so you're now definitely maintaining your own database server, when part of the point of getting into serverless is to not worry about servers.

You don't see a problem with talking to your DB over the public networking, with long round-trips?
That's the service that mlab.com offers (which we use and are happy with). It does mean you have to be a little careful about your queries though, and cache if necessary.
No, you do not make your DB publicly available only because you are using TLS. In general you do not expose anything to the world unless it is necessary and your service is battle tested. There are probably shitloads of possibly RCE-able vulnerabilities in the public facing code of most DBS, because, the heck, they aren't built to serve the public web, but to do database magic.