Hacker News new | ask | show | jobs
by nu11ptr 1087 days ago
I do network automation for a profession. I build tools (technically compilers) that take a proprietary object model designed for our private cloud and translate that into Ansible (v1) or Terraform (v2) code. At our company, I actually call using these tools in isolation doing it "manually". This is because the largest benefit of automation, I believe, is the abstraction gained from the new object model and being to to generate and store the inputs for Ansible/Terraform in a database. If you have to track and specify all the inputs into Ansible/Terraform and write the playbooks/HCL manually it is my experience you don't actually save all that much work. However, when you have an object model specifically designed for your use case, you can deliver a new client network in literally minutes (essentially nothing more than the cloud model, exactly what AWS/Azure, etc does for their networking). The downside is most enterprises don't have people like me to write the code to do this, and writing it for a single deployment would likely not see the gains that we see as a managed service provider.
6 comments

Isn't that a lot of words to say that you have a custom set of Terraform modules for your needs? If you're describing a different or better way to do it I'm missing it.
No. It is a frontend application that works as a CRUD REST API, validates the data, generates what it can, and stores it into a database/IPAM. It can then be changed, viewed, modified, deleted, etc.

When you are ready to deploy I "compile" the object model data into an IR (representing the "network topology") and then make a final pass and translate into HCL for all the various backends.

I'm not saying its "better" as it has trade offs. I'm saying for networks specifically, it is the only way I've seen in the real world to give these tools lots of value. Otherwise the network engineers end up spending all their time looking up the input data (vlans, subnets, ips, etc.) which is the part that is most time consuming for manual configuration as well. The validation and auto-generation of the input data is where the value comes in.

Got it thanks, makes sense. The way I've frequently seen this done, that goes more in line with the IaC and GitOps trends, is people making a PR to the config repo with the required values. Then a pipeline runs and does all validations, pulls data from external sources, and runs the terraform plan. If everything looks good upon review a merge applies the saved plan.
Interesting way to do things there. Have you looked into Pulumi or Terraform CDK?

I don’t know if either of those would help you or not and I’m not proficient in either, but some of the components you described seem like they might have some overlap.

Those things are about using code instead of HCL for modeling primarily. For us, it is about UI and UX (it is a REST API consumed by a Rundeck form and other services) as most of our engineers are not devops trained. Also, TF is only one possible backend. We actually emit other configuation code and configuration instruction sheets as MD and PDF for things we don't support.
There's a push and pull; ansible and terraform both have some facilities for doing what you describe, but of course if you're using both tools, then you wind up where you are, needing yet another layer of abstraction common to both.

In the book, the author presents an approach for storing the object state and organizing the repository for ansible purposes in what is at least as sensible a way as any other I've seen. For installations that might not directly benefit from additional layers of abstraction, managing object model state using ansible's native functionality might well be sufficient.

This is all a legitimate challenge, in any case. Network infrastructure and service instances have some management issues in common, but where they differ, they can differ by quite a bit, in ways that are hard to model at any level of abstraction.

I'm not using both. The first version of my tool used Ansible. The second version used Terraform. They were written 4 years apart. My users are not devops savvy. They use runbook forms to call into my API giving them a very simple UI that requires almost zero input. The object model includes lifecycling so certain attributes can be changed, etc. and validation done to ensure only a correct network is output. This isn't required by everyone, but it wasn't done out of necessity on how I'm using the tools, but to satisfy the business problem I'm trying to solve (automate network deployment with as few human inputs as possible over the entire lifespan of a client and infrastructure).

I wasn't critiquing the author, but networks inherently have a lot of input data. Much of this is not of concern to the end user, hence why public clouds require almost zero input on the network side.

I agree that my object model is purpose built for our product. It would not work for someone else's network.

I’m currently using Ansible for something similar. Mind if I ask why you switched to Terraform?
Faster: it uses a local state file, so it doesn't need to interrogate the devices every time.

Stateful: you don't have to manually track "present" and "absent" - you just omit and it will notice it needs to delete it

More standard: Writing HCL is very similar between providers. Every module in Ansible typically behaves pretty differently

This sounds interesting, but I am not sure I fully understand. Could an analogy be the object model to loosely correspond to sth like Amazon cdk and the Ansible part being the derived Cloudformation (any other analogy should do, but those are things I understand a bit more although I use quite a bit of ansible, but I am no network Person)? I still don't fully understand the database part. Is it a better way to manage env variables/allows for more flexible input?

Thank you

Essentially we have a very specific network topology we are trying to build for each of our clients. The goal is to auto-generate as much of the input as possible, validate that which is given, and allow it to be lifecycled (attributes can change, but only in certain valid ways, objects created/changed/deleted, but only if they aren't referenced by other objects, etc). Due to this, a database is need to store each "object". When the network is "pushed", the database walked and a fresh set of ansible (or terraform for v2) is generated in seconds.

Iow, it is custom set of lego bricks that can only be combined in certain ways to build valid networks. It is propriety to our cloud product which has the benefit of allowing us to abstract things away that others probably couldn't, but the downside of making it entirely non-reusuable for a different use case.

just curious, is your system publicly available or is it internal tooling of yours ? i spent a lot of time in service orchestration domain, and it been hobby of mine ever since.
internal, sorry
i worked on a product that did something similar for telecoms. had a closed loop automation and graphical designer for object model. it was 10 years ago.

looking today at all the manual work with playbooks/etc, it's astonishing. feels like things didn't move forward at all in past decade

Even in the big public clouds the user facing networking really hasn't progressed beyond a layer of lipstick on top of the kludges that were created for connecting physical servers 40 years ago.

For instance in AWS you still have to care about BGP and ASNs if you want to follow the most seamless approach to create a multi-region mesh of VPCs. Why should I have to care about that? AWS already knows where all the packets came from and where they're going and should just put them in the right place. I don't care how they get there and I certainly shouldn't have to care about BGP attributes[1].

1. https://docs.aws.amazon.com/network-manager/latest/cloudwan/...

probably interoperability with "legacy" equipment and networks
Are you using an open source tool/stack to do this? Sounds pretty awesome and I’d love to learn!
Mostly - Python and MongoDb mostly
This model is probably more common than you think, I don't see how anyone would be doing this any other way in a scalable fashion.