Flex Templates
1. Introduction
A Dataflow Flex Template allows you to package your Apache Beam pipeline code, system dependencies, and execution environment into a Docker container image. Once packaged, you can run the pipeline repeatedly with different parameters without needing to rebuild or compile the code.
2. Why This Concept Exists
In production systems, non-developers (like automation tools, cron jobs, or analytics teams) need to trigger pipelines. In the past, running a pipeline required installing Python, Java, and the Beam SDK on the local machine that submitted the job. Flex Templates solve this by containerizing the launch environment, allowing anyone to start a pipeline with a simple API call or gcloud CLI command.
3. Key Terminology
- Classic Template: Legacy templates that compile a static execution graph. They do not support dynamic dependencies or custom sources.
- Flex Template: Modern templates that compile the graph dynamically inside a Docker container on Google Cloud. They support dynamic dependencies.
- Template Spec File: A small JSON file stored in Google Cloud Storage describing where the Docker image lives and what parameters it accepts.
4. How It Works
- Containerize: Package your code and requirements into a Dockerfile.
- Build: Upload and build the image on Google Artifact Registry:
gcloud builds submit. - Register: Create a template spec JSON file pointing to the image and upload it to GCS.
- Run: Launch the pipeline using
gcloud dataflow flex-template run, passing runtime parameters.
5. Visual Diagram
6. Code Example
A typical Dockerfile used to build a Python Flex Template:
FROM gcr.io/dataflow-templates-base/python39-template-launcher-base:latest
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/template/main.py"
COPY . /template
# Install pipeline requirements
RUN pip install --no-cache-dir -U -r /template/requirements.txt
7. Code Explanation
python39-template-launcher-baseis the official Google base image containing the launcher.FLEX_TEMPLATE_PYTHON_PY_FILEis an environment variable telling the launcher which script is the main entry point.- We copy files to
/templateand runpip installto load packages.
8. Real Production Example: Template Spec JSON
A configuration spec file named metadata.json describing parameters:
{
"image": "us-central1-docker.pkg.dev/my-project/repo/my-pipeline-image:latest",
"sdk_info": {"language": "PYTHON"},
"metadata": {
"name": "My Log Pipeline",
"parameters": [
{
"name": "input_path",
"label": "Input File Path",
"helpText": "GCS path to read logs from",
"isOptional": false
}
]
}
}
9. Common Mistakes
- Using non-standard base images: You must use Google's official template launcher base images. Standard python images (like
python:3.9-slim) do not contain the launcher engine and will fail to start. - Hardcoding GCS buckets in code: Ensure all bucket paths are configured as pipeline options so they can be overridden at launch time.
10. Interview Perspective
- Question: What is the main difference between Classic Templates and Flex Templates?
- Answer: Classic templates compile the execution graph locally before upload; they do not support dynamic pipeline structures. Flex templates package the code in a Docker container, compiling the execution graph dynamically on GCP. This supports custom sources and dynamic path inputs.
- Question: Where does the container run when a template is launched?
- Answer: Google Cloud spins up a temporary virtual machine (launcher VM) to execute the container, compile the execution graph, and submit it to the Dataflow service before shutting down.
11. Best Practices
- Pin dependencies with exact versions in your
requirements.txtto guarantee template build reproducibility. - Document parameters thoroughly in the
metadata.jsonfile.
12. Summary
- Flex Templates containerize pipelines using Docker.
- Image is stored in Artifact Registry; JSON spec is stored in GCS.
- Enables code reuse and simple API-driven launches.