Production TipsEvergreen Article

Deep Dive into Cloud Dataflow Autoscaling

Published: July 02, 20268 min read

One of Google Cloud Dataflow's primary benefits is autoscaling: the ability of the runner to dynamically allocate worker VMs based on the pipeline's workload, reducing resource costs while meeting performance guarantees.

However, how does Dataflow decide when to scale up or scale down, and what metrics does it evaluate?


1. The Scaling Algorithms

Dataflow's autoscaling algorithm operates differently depending on the pipeline execution mode:

Batch Autoscaling

For batch pipelines, Dataflow evaluates the total work remaining (in bytes) and the throughput rate of the current workers.

  • The pipeline execution engine divides the work into sub-tasks (splits).
  • If the estimated time to process the remaining splits exceeds the targeted runtime limits, Dataflow provisions more worker VMs.
  • Dynamic Work Rebalancing: If a worker node is slow (a "straggler"), Dataflow dynamically splits the remaining work assigned to that worker and redistributes it to other idle workers.

Streaming Autoscaling

Streaming pipelines do not have a finite dataset size, so they scale based on latency and backlog:

  • System Lag: The time (in seconds) that data is waiting to be processed. High system lag triggers scale-ups.
  • Queue backlog: The volume of pending messages in the source queue (e.g. unacknowledged messages in GCP Pub/Sub or offset lag in Kafka).
  • CPU Utilization: If average worker CPU utilization exceeds 80%, Dataflow scales up. If CPU usage drops below 20%, it scales down.

2. Tuning Autoscaling Options

You configure autoscaling boundaries using execution options. Here is a command example showing how to set maximum worker VM boundaries when running a Python pipeline on Google Cloud Dataflow:

bash
python pipeline.py \
  --runner=DataflowRunner \
  --project=my-gcp-project \
  --region=us-central1 \
  --autoscaling_algorithm=THROUGHPUT_BASED \
  --num_workers=2 \
  --max_num_workers=20 \
  --worker_machine_type=n1-standard-4 \
  --temp_location=gs://my-bucket/temp

3. Optimizing Autoscaling Performance

  • [ ] Enable Streaming Engine: Worker-based streaming pipelines must store state on local VM persistent disks, making scale-down operations slow because state files must be copied between nodes. Streaming Engine stores state in a cloud service layer, allowing workers to scale down instantly.
  • [ ] Avoid Blocking UDFs: If your Python DoFn contains blocking network requests (e.g. calling an external API sequentially), the worker will spend CPU cycles waiting on network responses. Since the CPU usage remains low, Dataflow will not scale up even though System Lag is increasing! Always use connection pools and batch requests.
  • [ ] Define Reasonable max_num_workers Limits: Set maximum worker limits carefully to prevent runaway resource costs in the event of upstream database loops or massive telemetry backfills.