2 Minute Serverless
Posts
Using Terraform Null Resource vs External Data vs Custom Provider: A Deep Dive

Using Terraform Null Resource vs External Data vs Custom Provider: A Deep Dive

Optimizing Usage Of Scripts in Infrastructure as Code: A Practical Guide

Mohinish S
December 25, 2024

During a recent interview, I was asked about the practice of invoking scripts to handle some custom resource management task, lets say creating s3 buckets or ensuring the bucket policies are intact for a set of s3 buckets.

When using Terraform, practitioners often encounter scenarios requiring custom logic that isn’t natively supported by Terraform’s resource providers. One such workaround involves leveraging the null resource to invoke a Python script. While this method is simple and quick to implement, alternative approaches such as using the data external resource or creating a custom Terraform provider offer their own unique advantages. This blog explores these three approaches and compares them based on use cases, pros, and cons.

Thus integrating Python scripts with Terraform, there are three main approaches: null_resource with local-exec provisioner, data "external" blocks, and custom providers.

Let's analyze each approach and their optimal use cases.

Null Resource with Python Script

Null resources in Terraform act as placeholder resources that don't create any real infrastructure but allow you to trigger provisioners and execute arbitrary commands, making them ideal for running Python scripts during Terraform operations.

resource "null_resource" "python_executor" {
  provisioner "local-exec" {
    command = "python script.py ${var.input_param}"
  }
}

The null_resource approach executes Python scripts as local commands during Terraform's apply phase. This is straightforward but has limitations in state management and data return handling.

Pros:

Quick to implement with minimal setup.
No need for additional tools or plugins.
Ideal for small, one-off customizations.

Cons:

Tightly coupled to the execution environment, requiring dependencies (e.g., Python) to be installed.
Difficult to scale and maintain as complexity grows.
Error handling and state management are rudimentary.

External Data Source

The external data source in Terraform provides a standardized interface for executing external programs and incorporating their output into Terraform's state management system, requiring the external program to accept JSON input and return JSON output in a specific format.

data "external" "python_data" {
  program = ["python", "script.py"]
  query = {
    param = var.input_param
  }
}

The external data source provides a structured way to execute external programs and capture their output in Terraform's state.

Pros:

Provides a cleaner interface for input and output handling.
Decouples execution logic from Terraform resources.
Output can be used dynamically in other Terraform resources.

Cons:

Slightly more setup compared to null_resource.
Limited to simple input-output tasks.
Performance can be a concern for long-running scripts.

Custom Provider

Custom providers in Terraform allow you to encapsulate complex business logic into a dedicated, maintainable Go package that integrates natively with Terraform's plugin system, offering full state management, validation, and resource lifecycle control through a well-defined API.

provider "custom" {
  // Configuration options
}

resource "custom_resource" "example" {
  input_param = var.input_param
}

Creating a custom provider requires more initial setup but offers the best integration with Terraform's ecosystem.

Pros:

Highly scalable and maintainable.
Offers first-class integration with Terraform workflows.
Well-suited for complex, reusable logic.

Cons:

Requires significant time and expertise to develop.
Written in Go, not Python.
Overhead might be unnecessary for simple use cases.

Comparision

Aspect	Null Resource	External Data	Custom Provider
Setup Complexity	Low - Simple script execution	Medium - Requires JSON I/O handling	High - Requires Go development
State Management	Poor - No built-in state tracking	Good - State managed via data source	Excellent - Full state management
Error Handling	Basic - Exit codes only	Moderate - JSON response validation	Robust - Native error handling
Reusability	Limited - Script-dependent	Moderate - Reusable across configurations	Excellent - Distributable as package
Development Time	Quick - Immediate implementation	Medium - JSON interface needed	Long - Provider development required
Maintenance	Challenging - Manual script updates	Moderate - Version control needed	Simple - Managed via registry
Input/Output Control	Limited - Environment variables/CLI	Good - Structured JSON	Excellent - Schema-validated
Testing	Difficult - Manual testing required	Moderate - JSON testing possible	Excellent - Built-in testing framework
Documentation	Manual - Separate documentation needed	Basic - Data source docs	Excellent - Registry documentation
Version Control	Manual - Script versioning needed	Good - Script + configuration	Excellent - Provider versioning

Recommendations

Choose based on your requirements:

Use null_resource for simple, one-off script executions
- Personally, I would try to avoid the usage of scripts since it introduces security and governance risk with systems where they get invoked.
Use external data for structured data exchange without complex logic
- Personally, I would use these for limited use cases.
Develop a custom provider for production-grade, reusable infrastructure components
- This is the best approach but requires heavy lifting.

The choice ultimately depends on factors like development time, maintenance requirements, and reusability needs.