Hacker News new | ask | show | jobs
by aChrisSmith 2921 days ago
Hi! I work at Pulumi and have been using it to standup and manage all of our service infrastructure.

> How does Pulumi keep track of which services are launched, especially during testing/development

Each Pulumi program is ran within the context of "a stack". The stack is essentially a collection of cloud resources. So when the Pulumi program runs, it will create resources that aren't in the stack, or update existing ones.

So if you create any resources during dev/testing, you just need to `pulumi destroy` those stacks and all of the cloud resources will be reclaimed.

This, IMHO, is one of Pulumi's best features. In that it makes it super-easy to create your own instance of a cloud application. For example, I have my own development instance of app.pulumi.com by just creating my own Pulumi stack and rerunning the same application.

> How does it determine the optimal size of instances/volumes/etc to launch?

It doesn't. The Pulumi program ran determines what resources to create. So you are left to configure, tune or tweak that as makes sense.

2 comments

From the examples it looks like Pulumi programs declare their infrastructure, causing it to be created. Doesn't that mean that the program will need privileged credentials? How do you make sure the app only has, say, read access to an S3 bucket it needs to listen to, and can't accidentally delete it? And how does that then allow it to declare the bucket?
> Doesn't that mean that the program will need privileged credentials?

Obviously whatever program is actually creating the cloud resources will need credentials to do so. However, they aren't part of the Pulumi program.

When you run `pulumi update` on your machine (or on a CI/CD server) Pulumi will pick up whatever ambient credentials are on the machine. (e.g. ~/.aws/credentials.) So if you to restrict the credentials used to update a particular Pulumi stack, you just need to swap out whatever the current credentials are. (e.g. an AWS_ACCESS_KEY_ID env var.)

> How do you make sure the app only has, say, read access to an S3 bucket it needs to listen to, and can't accidentally delete it? And how does that then allow it to declare the bucket?

There are a lot of good questions there, so let me show you a quick example:

```typescript const imagesBucket = new aws.s3.Bucket( "images", { bucket: "example.com-images", acl: "private", }); ```

This snippet will create a new AWS S3 bucket named "example.com-images". It also sets the default ACL for the bucket to "private". Nothing too surprising there.

If you wanted another resource to have read access to that bucket, you would need to configure AWS to grant access. The Pulumi programming model is about how you declare/describe/create resources, but not actually define policy for how they work. So when using AWS, you would potentially need to create an `aws.iam.Role` / `aws.iam.RolePolicyAttachment` object and hook them up. (Or, if using Azure or GCP, configure access using some other method.)

So in short, to configure what _cloud resources_ can read/write other _cloud resources_, it's a matter of how the cloud resource provider exposes that.

When it comes to matters like preventing you from accidentally deleting the resources when you run `pulumi update` on a program, there are a few features that can help you with that. You can mark a resource as `protected`, so that any update that would delete that resource would produce an error. (Until you update the program again, making that resource as not protected.) Also, the `aws.s3.Bucket` type has a `forceDelete` parameter, that does something very similar. Unless set to true, the Bucket object cannot be deleted. (Thereby preventing some accidental dataloss.)

Does that make sense?

Makes sense. That makes it sound like Pulumi only runs the infrastructure declarations when you run "pulumi update", and that those things don't run when your program runs. That's confusing to me, because your examples (like the thumbnailer) seems to have the program and the declarations in the same file.

Is Pulumi stateful, then? If you create resources with "pulumi update", change the declarations without updating, and run "pulumi destroy" or whatever, it will only delete the stuff you created in the first step? (That is what I would expect. I would also expect it to support a dry run mode with a diff showing what operations would be executed.) If so, where is this state stored?

(I'm a product manager at Pulumi.)

> That makes it sound like Pulumi only runs the infrastructure declarations when you run "pulumi update", and that those things don't run when your program runs. That's confusing to me, because your examples (like the thumbnailer) seems to have the program and the declarations in the same file.

This is an optional way to do it, by combining the runtime code and infra code. The runtime code doesn't run when you deploy with "pulumi update," but it is packaged and sent to AWS.

You can also put the runtime code in a different file, as in this example: https://github.com/lindydonna/velocity-examples/tree/master/...

> Is Pulumi stateful, then? If you create resources with "pulumi update", change the declarations without updating, and run "pulumi destroy" or whatever, it will only delete the stuff you created in the first step? (That is what I would expect. I would also expect it to support a dry run mode with a diff showing what operations would be executed.) If so, where is this state stored?

Yes, the state is stored on pulumi.com. The state is list of resource IDs that you provisioned. The Pulumi CLI does indeed have a dry run mode that shows a diff: whenever you run "pulumi update", it first shows a preview.

Is there an option for self-hosting your state files vs relying on pulumi.com?
You could self-host if you want. The default is relying on pulumi.com
Does it understand how to mutate resources in-place with out a downtime? Or is that code/logic something I need to write and track like I do today?
(Disclosure: I work at Pulumi)

Yes, Pulumi does mutate resources in place, if the cloud provider supports it. For most resources, it will create a new one (such as a new ECS task), and wait for it to be ready before deleting the old one.

Several years ago, my employer created a similar tool with this exact same "feature". What we've found is that while standing up entire stacks in non-prod is kinda cool at first, it's a real drag at scale. We've had to walk back that feature with some hackish workarounds. We've also found that all the API calls necessary to determine what needs to be created can result in us being throttled by Amazon (the dread "Rate Limit Exceeded" error).

Still, this looks very cool, in that it's a real programming language and not YAML/JSON (which is another of our problems).

Could you provide some more detail on what made it a drag? Was it just the Amazon API issues? Cost? Security? Governance? Your experiences here seem like they could be valuable to other folks in the same situation.
Amazon API issues and the amount of time it takes to "discover" complex application stacks in production.

Concrete example: Our framework pulls in remote service dependencies via a link to an ELB in order to set remote HTTP endpoint URLs (yes, we know service discovery is a thing, but that's not where we were when we started). Some projects have 15+ dependencies, and it would take literally hours for it to walk the dependency tree. As a workaround, someone built the capability of passing in those dynamic URL endpoints and then the deployments were revised to build the remote URLs via string interpolation. Deployment time dropped to 10 minutes once we walked away from the concept of deploying stacks from the top down.

2nd concrete example: A developer used an incorrect argument during a deployment and deployed a second full stack of his application rather than replacing a single service. (I understand most other tools have diffs/change sets, but this particular developer isn't the sharpest knife in the drawer...) Rather than fix it immediately, he manually fiddled DNS entries and launch configs to create a mishmash stack. Naturally, he didn't tell anyone, so it took weeks (and lots of EC2 $$$) before we found and fixed it all.

I do see some value in an automated full stack deploy with all dependencies, but it should be the exception and not the rule.