Table of contents
Open Table of contents
Intro
There are many ways to organize and manage Terraform projects. Some setups require use of additional tools to manage variables, environments/stages, deployments and many more aspects of said projects. By additional I mean anything additional to Terraform itself, git and CI/CD tooling.
Let’s take a look at one possible way to organize and manage a monorepo setup, which will contain multiple projects and Terraform modules, with deployments spanning across multiple targets such as AWS accounts or Azure subscriptions.
The presented setup is geared towards an inhouse setup, a consultancy working with multiple clients may wish to keep Terraform modules separate for cross-utiliziation between different client projects.
Assumptions & requirements
In order to help understand the setup, let me walk through a few assumptions and requirements:
- The setup must support management of multiple Terraform projects (1 project = 1 state) and versioned Terraform modules which are used in projects.
- Modules must be versioned with semantic versioning.
- No additional tools will be used.
- Git branches will not be used to manage different deployments.
- Because of the previous, git
main
branch must always contain the source code for all deployed projects, in all live environments. - Deployments to development environments can be done from developer’s machine to enable development of the projects and modules.
- Deployments to any other environment must be done through CI/CD. Developers will have only read access to these environments.
- GitHub will be used to host the repository, and GitHub Actions will be used to run CI/CD pipelines. This means that the pipeline definitions are managed right alongside with the Terraform code.
- Terraform state is managed in shared, but deployment target / environment specific place. In case of AWS in S3 bucket with DynamoDB table for locking, and in case of Azure in Storage Account. Each AWS account / Azure subscription will have its own state store.
- Terraform plans must be reviewed and approved before deployments to live environments other than development can be done.
Why git branched are not used to manage deployments? If the project deployments to live environments are governed by git branches, it becomes extremely difficult to see, compare and understand what is deployed and where at any given time.
The main design principle of this approach is to keep things as simple as possible.
Repository structure
Before I will go through the development and deployment workflows, let’s see how the repository is organized:
terraform-monorepo> tree -L 4 -a
.
├── .github
│ ├── actions
│ │ ├── deploy
│ │ │ └── action.yml
│ │ └── prepare-plan
│ │ └── action.yml
│ └── workflows
│ ├── deploy-main.yml
│ └── pr-to-main.yml
├── .gitignore
├── LICENSE
├── README.md
├── docs
│ ├── whats-in-here.txt
│ └── workflow.jpg
├── live
│ ├── development
│ │ ├── api
│ │ │ ├── environment.auto.tfvars
│ │ │ ├── main.tf
│ │ │ ├── outputs.tf
│ │ │ ├── providers.tf
│ │ │ └── variables.tf
│ │ ├── data_processor
│ │ │ ├── user-data.sh
│ │ │ ├── environment.auto.tfvars
│ │ │ ├── main.tf
│ │ │ ├── outputs.tf
│ │ │ └── variables.tf
│ │ └── tf_state
│ │ ├── environment.auto.tfvars
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ ├── production
│ │ ├── api
│ │ │ └── [omitted-for-brevity]
│ │ ├── data_processor
│ │ │ └── [omitted-for-brevity]
│ │ └── tf_state
│ │ └── [omitted-for-brevity]
│ └── staging
│ ├── api
│ │ └── [omitted-for-brevity]
│ ├── data_processor
│ │ └── [omitted-for-brevity]
│ └── tf_state
│ └── [omitted-for-brevity]
└── modules
├── api_layer
│ ├── v1.0.0
│ │ └── [omitted-for-brevity]
│ └── v1.1.0
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── processing_cluster
│ ├── v1.0.0
│ │ └── [omitted-for-brevity]
│ ├── v1.0.1
│ │ └── [omitted-for-brevity]
│ └── v1.2.0
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── vpc
└── v1.0.0
├── main.tf
├── outputs.tf
└── variables.tf
Whoa, that’s a lot of directories! Let’s break it down bit by bit, and start with root-level directories:
- Directory
.github
will contain the CI/CD workflow and action definitions. - Directory
docs
will contain all assets, which are not part of the actual source code. Things like images for documentation etc. will go in here. - Directory
live
will contain each environment/stage as subdirectory. - Directory
modules
will contain all custom Terraform modules.
Let’s look into modules
next:
- Each module will have its own directory.
- Modules are versioned with subdirectories. In a project, the correct version of a module is imported with
source = "../../../modules/vpc/v1.0.0"
. - As new version of a module is needed, it will get a new subdirectory with name following semantic versioning.
And finally the live
directory:
- Individual Terraform projects are here as subdirectories, under respective environment/stage directories.
- Each project will be deployed separately, and will have its own Terraform state.
- Terraform plan and apply will be run against each project-specific subdirectory, e.g.
development/api
orproduction/data_processor
in the above example. - Each stage (
development
) of each solution (api
) — e.g.development/api
— is handled as individual project. This choice will potentially lead to code duplication between development, staging and production environments, but this can be somewhat tackled with smart use of modules. Some duplication will obviously remain, but this is the price we must pay in order to have clear visibility to existing and deployed code without getting lost within git branches or Terraform workspaces.
Project development workflow
How does the development and deployment workflow look? The idea is to keep things as simple as possible, and as stated earlier, all live environments must be in main
branch after each development cycle.
- Develop project locally in a stage and project specific subdirectory, in
issue-nn
/dev
branch. - While developing, run
terraform plan
andterraform apply
against the dev environment, since that will be the only one you as a developer will have write access. - When you are confident, that your solution works, add/update GitHub Actions workflow definition. The definition file is shared between all projects. The code duplication in workflows can be minimized by using composite actions to share CI/CD code between projects.
- Commit, and push your changes to GitHub to your development branch, e.g.
issue-nn
ordev
. Create a pull request tomain
branch. - This PR will trigger CI pipeline
pr-to-main.yml
which will validate the source code and runterraform plan
for all projects. The plans’ outputs are written as comments to the PR. Again, changes should be found only for the projects, that you changed! You did not change dev, test and prod at the same time, did you? - Review the plans, and if all looks good, merge the changes to branch
main
. If they do not look as expected, iterate back to step 1. - When the changes are merged to branch
main
, CD pipelinedeploy-main.yml
will runterraform apply
for all projects. - After the changes are in branch
main
and successfully deployed, delete the development branch.
Module development workflow
Module development workflow is very similar compared to project development.
- Developer will create a new subdirectory for the new module version and a new development project where the module can be tested while in development. This dev project is deleted, when the module version is ready.
- While developing, run
terraform plan
andterraform apply
against the dev environment, since that will be the only one you as a developer will have write access. - When you are confident, that your module works, commit the changes and push your changes to GitHub to your development branch, e.g.
issue-nn
ordev
. Create a pull request tomain
branch. - Even though CI pipeline will run, no changes should be found at this stage, since only module code was changed.
- Review the PR comments, that this indeed is the case.
- Since the same CD pipeline will be run as with project deployments,
terraform apply
will be run for all projects, but without any changes. - After the changes are in branch
main
the development branch can be deleted.
Pros and cons
There are many benefits to the presented setup:
- Single source of truth. All source code is managed in the same place, and visible in developer’s local filesystem.
- Comparing different projects and stages is trivially easy.
- All available modules and their versions are easily discoverable without external documentation and indexes.
- Source code validation and linting can be done in one place for example with pre-commit hooks.
- No need to setup git submodules.
- No need for cross-repository access in CI/CD.
- Deployment configuration can be easily shared.
- Since all CI/CD pipelines are run from the same repository, each target (AWS account, Azure subscription, etc.) requires only one setup per target for authentication and authorization.
There are also some things, which are less than optimal:
- For a new developer, who is just starting with one project, the amount of code can feel overwhelming.
- Modules cannot be versioned with git tags, because they are part of the monorepo which is versioned as a whole.
- When compared in one project/deployment level, CI/CD pipelines will take longer to run, even when changes are done in one project only. Parallelism and smart caching can help some, but maybe not huge amounts.
- Since all projects are within same repository and deployment configuration, risk of mixing things up is greater than with individual project repositories.
Conclusion
While a monorepo setup is not perfect, in my opinion it offers far more benefits over individual project repositories. As a new developer a monorepo can feel overwhelming, but this can be helped a lot by focusing on one project (subdirectory) at a time, ignoring rest of the codebase.