Fixed Windows
1. Introduction
A Fixed Window (sometimes called a tumbling window) is a windowing strategy that partitions a data stream into non-overlapping, equal-sized time intervals (such as every 5 minutes, 1 hour, or 24 hours).
2. Why This Concept Exists
In streaming pipelines, data arrives continuously forever. If you want to compute statistics like "total purchases per hour," you need to segment the infinite timeline into distinct buckets. Fixed Windows are the simplest and most common way to group streaming events into clean, predictable time boxes.
3. Key Terminology
- Window Size: The duration of each time interval (e.g. 5 minutes).
- Tumbling / Non-overlapping: An event belongs to exactly one window. There is no overlap between adjacent windows.
- Window Boundary: The start and end time of a window (e.g.,
12:00:00to12:04:59).
4. How It Works
- Define size: Set the window duration in seconds.
- Assign: As events arrive, Beam inspects their Event timestamp.
- Map: Beam assigns the element to the corresponding interval block.
- Aggregate: Subsequent aggregations (like sum or average) are computed independently inside each time block.
5. Visual Diagram
Each incoming event is assigned to a single, static time slice:
6. Code Example
Calculating transactions totals over 10-minute fixed intervals:
import apache_beam as beam
from apache_beam.transforms.window import FixedWindows
with beam.Pipeline() as p:
(p
| "ReadTransactions" >> beam.io.ReadFromPubSub(subscription="projects/my-gcp/subs/transactions")
| "ParseAmounts" >> beam.Map(lambda msg: float(msg.attributes.get("amount", 0)))
| "ApplyFixedWindows" >> beam.WindowInto(FixedWindows(10 * 60)) # 10 minutes
| "SumTransactions" >> beam.CombineGlobally(sum).without_defaults()
| "LogTotals" >> beam.Map(print))
7. Code Explanation
beam.WindowInto(FixedWindows(10 * 60))divides the stream into 600-second (10-minute) windows.CombineGlobally(sum)aggregates the parsed values per window.without_defaults()prevents empty window firings if no elements arrive during the interval.
8. Real Production Example
In network traffic monitoring, a pipeline windows cellular event feeds into 1-minute fixed windows to compute the rate of failed call attempts per tower, alerting operators if failures spike in any single minute.
9. Common Mistakes
- Forgetting to set timestamp attributes: If your data source does not inject event timestamps, Beam defaults to the ingestion time (processing time), which makes it impossible to run accurate backfills.
- Assuming ordering across windows: Output results from windows are processed as separate independent collections and can arrive downstream out of sequence.
10. Interview Perspective
- Question: What happens to an element whose timestamp falls exactly on a window boundary (e.g. 12:05:00)?
- Answer: In Apache Beam, window intervals are closed on the start boundary and open on the end boundary (
[start, end)). Therefore, an event at12:05:00belongs to the12:05:00 - 12:10:00window. - Question: How do fixed windows differ from sliding windows?
- Answer: Fixed windows are non-overlapping. Sliding windows overlap (e.g. 1-hour windows recalculating every 5 minutes), meaning a single event can belong to multiple windows simultaneously.
11. Best Practices
- Use Fixed Windows for standard dashboard metrics (e.g. active users per day, server hits per hour).
- Always configure an
Allowed Latenesspolicy if your network experiences latencies to avoid discarding delayed events.
12. Summary
- Fixed windows segment timelines into equal, non-overlapping intervals.
- Assigned based on Event timestamps.
- Outputs are calculated independently per time slice.