Sliding Windows
1. Introduction
A Sliding Window (sometimes called a hopping window) is a windowing strategy that partitions a data stream into overlapping, equal-sized time intervals. It is defined by two properties: the Window Size (duration) and the Slide Period (recalculation frequency).
2. Why This Concept Exists
Fixed Windows segment time into strict, non-overlapping blocks. However, many business metrics require a continuous moving average—such as calculating the "average website response time over the last 10 minutes, updated every 1 minute." With Fixed Windows, you only get updates once every 10 minutes. Sliding Windows allow you to overlay multiple time blocks, giving you real-time moving metrics.
3. Key Terminology
- Window Size: The duration representing how much history each window contains (e.g., 10 minutes).
- Slide Period: The frequency of how often a new window starts (e.g., 1 minute).
- Overlapping: Because the slide period is smaller than the window size, a single data point belongs to multiple windows simultaneously.
4. How It Works
- Beam generates a new window at every interval defined by the Slide Period.
- When an event arrives, Beam inspects its timestamp.
- Because windows overlap, the event is assigned to all active windows that cover its timestamp.
- For example, if size is 10 minutes and slide is 2 minutes, a single event will be replicated and processed inside 5 separate overlapping windows.
5. Visual Diagram
6. Code Example
Calculating a moving average of values over 10-minute windows, recalculating every 1 minute:
import apache_beam as beam
from apache_beam.transforms.window import SlidingWindows
with beam.Pipeline() as p:
(p
| "ReadStream" >> beam.io.ReadFromPubSub(subscription="projects/my-gcp/subs/metrics")
| "ParseFloats" >> beam.Map(lambda x: float(x))
| "ApplySlidingWindows" >> beam.WindowInto(SlidingWindows(size=10 * 60, period=60)) # 10m size, 1m slide
| "MovingAverage" >> beam.CombineGlobally(beam.combiners.MeanCombineFn()).without_defaults()
| "Log" >> beam.Map(print))
7. Code Explanation
SlidingWindows(size=10 * 60, period=60)configures the window boundaries (600s size, 60s period).CombineGlobally(MeanCombineFn())computes the mean independently for each window.- Outputs are produced every 1 minute, representing the moving average of the last 10 minutes of logs.
8. Real Production Example
In stock trading platforms, you monitor price volatility. You configure a sliding window of 1 hour, sliding every 10 seconds, to compute a rolling standard deviation of stock prices, alerting traders immediately if price variance spikes.
9. Common Mistakes
- Creating too many overlapping windows: Setting a very large size and tiny slide (e.g. 24-hour windows sliding every 1 second) forces Beam to replicate every element
86,400times. This will crash worker memory and result in massive computation delays. - Expecting updates before the slide period: The pipeline will only output results when a slide boundary passes.
10. Interview Perspective
- Question: If window size is 30 minutes and slide period is 5 minutes, how many windows will a single event belong to?
- Answer: It will belong to exactly
size / period = 30 / 5 = 6windows. - Question: How does memory consumption compare between Fixed and Sliding windows?
- Answer: Sliding windows require significantly more memory because elements are duplicated across multiple active window state buffers on the workers.
11. Best Practices
- Ensure the Slide Period is an exact divisor of the Window Size (e.g. size 60s, slide 10s) to keep window bounds clean.
- Avoid small slide periods (like 1 second) on high-throughput streams.
12. Summary
- Sliding windows group elements into overlapping time blocks.
- Configured by Window Size and Slide Period.
- Multiplies elements across overlapping intervals to compute rolling statistics.