intermediate
Windowing Basics
7 min readLast updated: 2026-06-30
1. Introduction
Windowing groups a PCollection into logical subsets according to the timestamps of its individual elements.
2. Why This Concept Exists
In unbounded (streaming) datasets, data is infinite, meaning you cannot perform global operations like "average amount" because the data never ends. Windowing divides this infinite stream into finite temporal buckets so you can run aggregations.
3. Key Terminology
- Event Time: The timestamp associated with the data element when it was generated at the source.
- Processing Time: The wall-clock time of the worker node processing the event.
- Window: A bounded slice of time (e.g. 5 minutes).
4. How It Works
- Elements are assigned timestamps at ingestion.
- The pipeline applies a
WindowIntotransform with a specific windowing strategy (e.g., Fixed, Sliding). - Beam groups element timestamps into corresponding windows.
- Subsequent transforms aggregate the elements per window.
5. Visual Diagram
Window [12:00 - 12:05]
12:0112:0312:04
Window [12:05 - 12:10]
12:06
6. Code Example
Applying a 5-minute fixed window to stream items:
python
import apache_beam as beam
from apache_beam.transforms.window import FixedWindows
with beam.Pipeline() as pipeline:
windowed = (
pipeline
| "ReadQueue" >> beam.io.ReadFromPubSub(subscription="projects/my-proj/subs/my-sub")
| "ApplyFixedWindows" >> beam.WindowInto(FixedWindows(5 * 60)) # 5 minutes
| "SumValues" >> beam.CombineGlobally(sum).without_defaults()
)
7. Code Explanation
beam.WindowIntoassigns windows.FixedWindows(5 * 60)defines non-overlapping 5-minute intervals.CombineGlobally(sum)calculates aggregates separately per window.
8. Real Production Example
Calculating traffic volumes on road sensors per hour: the sensor reports vehicle logs dynamically, and the pipeline groups them into 60-minute slices to monitor busy hours.
9. Common Mistakes
- Aggregating streams globally: Forgetting to apply windowing before running group or combine transforms on streaming sources will trigger a pipeline construction error or cause worker memory exhaustion.
- Ignoring late arrival times: In networks with delays, events can arrive after a window closes. Configure allowed lateness and triggers to handle them.
10. Interview Perspective
- Question: What are the main windowing strategies?
- Answer: Fixed Windows (non-overlapping), Sliding Windows (overlapping), Session Windows (gap-based activity), and Global Window (default).
- Question: What is allowed lateness?
- Answer: It is a buffer window extension during which late elements are still processed and merged into the window aggregates rather than discarded.
11. Best Practices
- Prefer Fixed Windows for periodic reporting (hourly/daily updates).
- Ensure downstream sink databases can handle updates or upserts if late-firing triggers are active.
12. Summary
- Windowing partitions infinite streams into finite buckets.
- Evaluated strictly based on Event timestamps.
- Essential prerequisite for aggregations on unbounded streams.
13. Interactive Challenge
13. Interactive Challenges
14. Related Content
Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)