In a recent stride towards bolstering their developer efficiency, DoorDash has harnessed the power of Open Policy Agent (OPA) to significant effect. The infrastructure team at DoorDash has observed remarkable advantages from this initiative, including expedited reviews of changes to infrastructure policies, more comprehensive tagging of resources, and a significant reduction in incidents arising from policy violations
Learning from past incidents – DoorDash’s journey to policy automation
A few years ago, DoorDash experienced an incident that caused a sudden drop in their order volume. Although the infrastructure team managed to rectify the situation within an hour, the incident’s root cause was a seemingly innocent mishap: the accidental removal of critical AWS resources buried within a Terraform code containing around 90 other resources. This realisation prompted DoorDash to embark on the path of policy automation to protect against such crucial oversights in the future.
The backbone – leveraging OPA and Atlantis
At the heart of DoorDash’s success lies Atlantis, an open-source orchestrator designed for Terraform plans. This orchestrator takes charge of the Terraform plan lifecycle. The process unfolds with users initiating infrastructure pull requests on GitHub, subsequently triggering a webhook event that gets sent to an Atlantis worker. This worker then fetches OPA policies from a designated S3 bucket.
Ensuring compliance – the role of policy rules
Crafting policy rules using Rego queries has proven instrumental for DoorDash. These policy rules serve as detectives, pinpointing deviations from the expected system state. Here, the contest tool enters the scene, rigorously utilising OPA policies to assess data against policy assertions.
Putting the pieces together – execution and review
Once the contest tool completes its assessment against the Terraform plan, Atlantis steps in to align the plan with the OPA-defined policies. The culmination of this process sees the results, in conjunction with intricate Terraform plan details, being added as comments directly onto the corresponding GitHub pull requests.
Streamlining workflow with pull approve
Moving the efficiency agenda further, DoorDash integrates Pull Approve into the equation. This GitHub integration masterfully handles critical aspects of the development lifecycle: code review, assignment, and policy compliance. With the necessary approvals securely in place, Atlantis takes the reins, executing modifications to AWS resources per the Terraform plan.
Policies – guardians of reliability, velocity, efficiency, and security
Illustrating the versatility of this automation endeavour, Lin Du, Senior Software Engineer at DoorDash, sheds light on the diversity of policies employed. These policies can be categorised into four distinct types: Reliability, Velocity, Efficiency, and Security.
Under the Reliability umbrella, a paramount consideration is shielding critical resources from unintentional deletion. Du elucidates this point with an example of a policy that identifies these indispensable resources. The policy implementation further mandates a verification step, demanding administrative review before any alterations to these resources can proceed.
For enhancing velocity, Du walks through a scenario where policies aid in checking Terraform modules within a given pull request against an already-approved list. Should the team employ an unlisted module, the policy nudges them to consider pre-approved alternatives, ensuring smooth and efficient code reuse.
Paving the way forward – advantages galore
The results of DoorDash’s policy automation are nothing short of transformative. The infrastructure team’s time invested in reviewing pull requests has notably diminished, paving the way for holistic product improvements. More critically, the team’s newfound ability to spot policy discrepancies early has served as a preemptive shield against incidents stemming from policy violations. A testament to their success, the team has achieved a remarkable increase in resource tagging coverage and standardisation, soaring from a modest 20% to an impressive 97.9%. This achievement has not only optimised costs but also streamlined team dynamics.
DoorDash’s voyage into policy automation exemplifies how leveraging tools like Open Policy Agent, Atlantis, and Pull Approve can yield transformative results. The company has propelled itself towards a future of enhanced developer efficiency, reduced incidents, and cost optimisation through meticulous policy crafting, alignment, and enforcement. As the industry continues to evolve, DoorDash’s success story stands as an inspiration, illuminating a path for others to follow in their pursuit of streamlined workflows and elevated productivity.