advanced

Flex Templates

7 min readLast updated: 2026-06-30

1. Introduction

A Dataflow Flex Template allows you to package your Apache Beam pipeline code, system dependencies, and execution environment into a Docker container image. Once packaged, you can run the pipeline repeatedly with different parameters without needing to rebuild or compile the code.

2. Why This Concept Exists

In production systems, non-developers (like automation tools, cron jobs, or analytics teams) need to trigger pipelines. In the past, running a pipeline required installing Python, Java, and the Beam SDK on the local machine that submitted the job. Flex Templates solve this by containerizing the launch environment, allowing anyone to start a pipeline with a simple API call or gcloud CLI command.

3. Key Terminology

  • Classic Template: Legacy templates that compile a static execution graph. They do not support dynamic dependencies or custom sources.
  • Flex Template: Modern templates that compile the graph dynamically inside a Docker container on Google Cloud. They support dynamic dependencies.
  • Template Spec File: A small JSON file stored in Google Cloud Storage describing where the Docker image lives and what parameters it accepts.

4. How It Works

  1. Containerize: Package your code and requirements into a Dockerfile.
  2. Build: Upload and build the image on Google Artifact Registry: gcloud builds submit.
  3. Register: Create a template spec JSON file pointing to the image and upload it to GCS.
  4. Run: Launch the pipeline using gcloud dataflow flex-template run, passing runtime parameters.

5. Visual Diagram

Pipeline Code + Dockerfile
Artifact Registry (Image)
GCS (JSON Spec)
Launch API

6. Code Example

A typical Dockerfile used to build a Python Flex Template:

dockerfile
FROM gcr.io/dataflow-templates-base/python39-template-launcher-base:latest

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/template/main.py"

COPY . /template

# Install pipeline requirements
RUN pip install --no-cache-dir -U -r /template/requirements.txt

7. Code Explanation

  • python39-template-launcher-base is the official Google base image containing the launcher.
  • FLEX_TEMPLATE_PYTHON_PY_FILE is an environment variable telling the launcher which script is the main entry point.
  • We copy files to /template and run pip install to load packages.

8. Real Production Example: Template Spec JSON

A configuration spec file named metadata.json describing parameters:

json
{
  "image": "us-central1-docker.pkg.dev/my-project/repo/my-pipeline-image:latest",
  "sdk_info": {"language": "PYTHON"},
  "metadata": {
    "name": "My Log Pipeline",
    "parameters": [
      {
        "name": "input_path",
        "label": "Input File Path",
        "helpText": "GCS path to read logs from",
        "isOptional": false
      }
    ]
  }
}

9. Common Mistakes

  • Using non-standard base images: You must use Google's official template launcher base images. Standard python images (like python:3.9-slim) do not contain the launcher engine and will fail to start.
  • Hardcoding GCS buckets in code: Ensure all bucket paths are configured as pipeline options so they can be overridden at launch time.

10. Interview Perspective

  • Question: What is the main difference between Classic Templates and Flex Templates?
  • Answer: Classic templates compile the execution graph locally before upload; they do not support dynamic pipeline structures. Flex templates package the code in a Docker container, compiling the execution graph dynamically on GCP. This supports custom sources and dynamic path inputs.
  • Question: Where does the container run when a template is launched?
  • Answer: Google Cloud spins up a temporary virtual machine (launcher VM) to execute the container, compile the execution graph, and submit it to the Dataflow service before shutting down.

11. Best Practices

  • Pin dependencies with exact versions in your requirements.txt to guarantee template build reproducibility.
  • Document parameters thoroughly in the metadata.json file.

12. Summary

  • Flex Templates containerize pipelines using Docker.
  • Image is stored in Artifact Registry; JSON spec is stored in GCS.
  • Enables code reuse and simple API-driven launches.

13. Interactive Challenges

Challenge 1: Flex Template Dockerfile (Beginner)

Write the Dockerfile lines required to copy the local directory to the /template path and set the Python entry point environment variable to /template/pipeline.py.

Challenge 2: Template Spec Metadata JSON (Intermediate)

Write a template spec JSON definition mapping the image path "gcr.io/my-gcp/image:v1" for a python SDK template.

Challenge 3: Template CLI Launch Command (Advanced)

Write the gcloud command to launch a Flex Template named "billing-job" in project "prod-billing", region "us-east1", using the template spec file stored at "gs://my-bucket/templates/billing.json", passing parameter input_path as "gs://my-bucket/input.txt".

14. Related Content

Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)