Global Window — Learn Apache Beam

1. Introduction

A Global Window is the default windowing strategy in Apache Beam. It groups all elements of a PCollection into a single, global window that spans the entire lifetime of the pipeline.

2. Why This Concept Exists

In batch processing, you are working with a finite dataset (e.g. a static CSV file). You don't need to segment data into hourly blocks—you just want to compute metrics over the entire dataset at once. Apache Beam assigns all elements to a single GlobalWindow by default. This allows standard batch pipelines to execute without needing any explicit windowing code.

3. Key Terminology

Global Window: The single, default window that holds all elements in a PCollection.
Default Strategy: If you do not apply beam.WindowInto in your pipeline, your data runs inside the Global Window.
End of Stream: The timestamp representing the end of time, marking the close of the global window.

4. How It Works

All elements are assigned to GlobalWindow with a start of -infinity and end of +infinity.
For Batch: Once all data is read, the window closes, and aggregations (like sum or count) execute.
For Streaming: Because the stream is infinite, the global window never ends. Therefore, standard aggregations (like CombineGlobally) will never fire because the watermark never reaches the end of the window. To aggregate data in a global window inside a streaming pipeline, you must configure a custom trigger (like firing every 10 elements).

5. Visual Diagram

Global Window (Default)

Event 1Event 2Event 3...

All elements are kept in a single, infinite window block.

6. Code Example

Configuring a global window in a streaming pipeline that fires a pane every time 5 new elements arrive:

python

import apache_beam as beam
from apache_beam.transforms.window import GlobalWindows
from apache_beam.transforms.trigger import Repeatedly, AfterCount, AccumulationMode

with beam.Pipeline() as p:
    (p
     | "ReadStream" >> beam.io.ReadFromPubSub(subscription="projects/my-project/subs/my-sub")
     | "ApplyGlobalWindow" >> beam.WindowInto(
         GlobalWindows(),
         trigger=Repeatedly(AfterCount(5)),
         accumulation_mode=AccumulationMode.DISCARDING
     )
     | "CountFive" >> beam.CombineGlobally(sum).without_defaults()
     | "Print" >> beam.Map(print))

7. Code Explanation

GlobalWindows() assigns all elements to the same single global bucket.
trigger=Repeatedly(AfterCount(5)) tells the runner: "Even though the window never ends, emit the current sum every time 5 elements arrive."
AccumulationMode.DISCARDING clears the buffer after each firing so the next pane starts counting from zero.

8. Real Production Example

In inventory systems, you maintain a running counter of total items sold. You run a streaming pipeline in a global window. As sales logs arrive, you update a stateful counter key using ValueState inside a global window, updating the master database in real-time.

9. Common Mistakes

Aggregating streaming data in a global window without triggers: Running stream | beam.CombineGlobally(sum) without applying a trigger will result in no output ever being emitted, as the global window never reaches its end boundary.
Memory leaks: Since the global window never ends, worker state buffers will accumulate data forever unless you configure discarding modes or clear state values.

10. Interview Perspective

Question: Why does a streaming pipeline in a global window require triggers?
Answer: Standard aggregations are triggered when the watermark passes the end of a window. Since the global window's end is +infinity, the watermark will never pass it. Triggers are required to override this behavior, forcing emissions based on element counts or processing durations.
Question: Can you use stateful processing in a global window?
Answer: Yes. In fact, stateful processing (like ValueState and BagState) is commonly used in global windows to maintain long-lived user states, shopping carts, or history tables.

11. Best Practices

Use global windows for all batch pipelines that do not require time-series aggregations.
Always define count or processing-time triggers when utilizing global windows in streaming pipelines.

12. Summary

Global Windows is the default strategy for all PCollections.
Gathers all data points into a single infinite timeline bucket.
Batch runs execute once; streaming runs require custom triggers to emit outputs.

13. Interactive Challenges

Challenge 1: Explicit Global Window (Beginner)

Write a windowing transform statement that explicitly assigns incoming elements to the Global Window.

Challenge 2: Streaming Global Count Trigger (Intermediate)

Write a windowing segment that applies global windows to a streaming PCollection, configuring a trigger to fire a pane every time 50 elements arrive, discarding previous pane values.

Challenge 3: Stateful Global Window Accumulator (Advanced)

Define a stateful DoFn class that processes element tuples (user_id, value) inside a global window, using a BagStateSpec to accumulate values for a user key, yielding the full list of values only when the list length reaches 10.

14. Related Content

Previous LessonTimestamp Assignment

Next LessonFixed Windows

Global Windows