- 2 Minute Serverless
- Posts
- Using Terraform Null Resource vs External Data vs Custom Provider: A Deep Dive
Using Terraform Null Resource vs External Data vs Custom Provider: A Deep Dive
Optimizing Usage Of Scripts in Infrastructure as Code: A Practical Guide
During a recent interview, I was asked about the practice of invoking scripts to handle some custom resource management task, lets say creating s3 buckets or ensuring the bucket policies are intact for a set of s3 buckets.
When using Terraform, practitioners often encounter scenarios requiring custom logic that isn’t natively supported by Terraform’s resource providers. One such workaround involves leveraging the null resource to invoke a Python script. While this method is simple and quick to implement, alternative approaches such as using the data external resource or creating a custom Terraform provider offer their own unique advantages. This blog explores these three approaches and compares them based on use cases, pros, and cons.
Thus integrating Python scripts with Terraform, there are three main approaches: null_resource with local-exec provisioner, data "external" blocks, and custom providers.
Let's analyze each approach and their optimal use cases.
Null Resource with Python Script
Null resources in Terraform act as placeholder resources that don't create any real infrastructure but allow you to trigger provisioners and execute arbitrary commands, making them ideal for running Python scripts during Terraform operations.
resource "null_resource" "python_executor" {
provisioner "local-exec" {
command = "python script.py ${var.input_param}"
}
}
The null_resource approach executes Python scripts as local commands during Terraform's apply phase. This is straightforward but has limitations in state management and data return handling.
Pros:
Quick to implement with minimal setup.
No need for additional tools or plugins.
Ideal for small, one-off customizations.
Cons:
Tightly coupled to the execution environment, requiring dependencies (e.g., Python) to be installed.
Difficult to scale and maintain as complexity grows.
Error handling and state management are rudimentary.
External Data Source
The external data source in Terraform provides a standardized interface for executing external programs and incorporating their output into Terraform's state management system, requiring the external program to accept JSON input and return JSON output in a specific format.
data "external" "python_data" {
program = ["python", "script.py"]
query = {
param = var.input_param
}
}
The external data source provides a structured way to execute external programs and capture their output in Terraform's state.
Pros:
Provides a cleaner interface for input and output handling.
Decouples execution logic from Terraform resources.
Output can be used dynamically in other Terraform resources.
Cons:
Slightly more setup compared to
null_resource
.Limited to simple input-output tasks.
Performance can be a concern for long-running scripts.
Custom Provider
Custom providers in Terraform allow you to encapsulate complex business logic into a dedicated, maintainable Go package that integrates natively with Terraform's plugin system, offering full state management, validation, and resource lifecycle control through a well-defined API.
provider "custom" {
// Configuration options
}
resource "custom_resource" "example" {
input_param = var.input_param
}
Creating a custom provider requires more initial setup but offers the best integration with Terraform's ecosystem.
Pros:
Highly scalable and maintainable.
Offers first-class integration with Terraform workflows.
Well-suited for complex, reusable logic.
Cons:
Requires significant time and expertise to develop.
Written in Go, not Python.
Overhead might be unnecessary for simple use cases.
Comparision
Aspect | Null Resource | External Data | Custom Provider |
---|---|---|---|
Setup Complexity | Low - Simple script execution | Medium - Requires JSON I/O handling | High - Requires Go development |
State Management | Poor - No built-in state tracking | Good - State managed via data source | Excellent - Full state management |
Error Handling | Basic - Exit codes only | Moderate - JSON response validation | Robust - Native error handling |
Reusability | Limited - Script-dependent | Moderate - Reusable across configurations | Excellent - Distributable as package |
Development Time | Quick - Immediate implementation | Medium - JSON interface needed | Long - Provider development required |
Maintenance | Challenging - Manual script updates | Moderate - Version control needed | Simple - Managed via registry |
Input/Output Control | Limited - Environment variables/CLI | Good - Structured JSON | Excellent - Schema-validated |
Testing | Difficult - Manual testing required | Moderate - JSON testing possible | Excellent - Built-in testing framework |
Documentation | Manual - Separate documentation needed | Basic - Data source docs | Excellent - Registry documentation |
Version Control | Manual - Script versioning needed | Good - Script + configuration | Excellent - Provider versioning |
Recommendations
Choose based on your requirements:
Use null_resource for simple, one-off script executions
Personally, I would try to avoid the usage of scripts since it introduces security and governance risk with systems where they get invoked.
Use external data for structured data exchange without complex logic
Personally, I would use these for limited use cases.
Develop a custom provider for production-grade, reusable infrastructure components
This is the best approach but requires heavy lifting.
The choice ultimately depends on factors like development time, maintenance requirements, and reusability needs.