advanced

Custom Windows

8 min readLast updated: 2026-07-01

1. Introduction

While standard windowing strategies—such as Fixed, Sliding, and Session windows—cover the majority of stream processing use cases, some applications require more specialized partitioning logic.

A Custom Window is defined by extending Apache Beam's WindowFn class. This allows developers to control exactly how elements are assigned to windows and how windows are merged, opening the door to dynamic, data-driven windowing behaviors.

2. Why This Concept Exists

Standard windowing strategies assume static, uniform temporal boundaries. However, real-world business rules are often non-uniform and complex.

Custom windows are necessary for scenarios such as:

  • Dynamic Sizes: Window durations that change based on payload attributes (e.g., high-priority user events get shorter windows, while low-priority events get longer ones).
  • Non-Standard Time Intervals: Grouping data based on calendar concepts that do not map to simple hourly or daily multipliers (e.g., custom factory shifts or fiscal quarters).
  • Custom Session Rules: Sessions that merge not just based on inactivity gaps, but on transaction boundaries (e.g., grouping events from checkout start to purchase completion).

3. Key Terminology

  • WindowFn: The base class in Apache Beam for all windowing strategies.
  • IntervalWindow: The standard representation of a time interval, defined by a closed start time and open end time [start, end).
  • Window Merging: The process of combining multiple windows into a single window.
  • Window Coder: A serializer that transmits window metadata across workers in a distributed cluster.

4. How It Works

To create a custom windowing strategy in Apache Beam:

  1. Inherit: Subclass the WindowFn base class.
  2. Assign: Override the assign(self, context) method. This method receives an AssignContext (which provides access to the element and its timestamp) and returns a list of windows that the element belongs to.
  3. Merge (Optional): For merging window types (like sessions), override the merge(self, context) method to inspect active windows and combine overlapping ones.
  4. Coder: Override get_window_coder(self) to provide a coder for window serialization.

5. Visual Diagram

Event (Priority: High)
Event (Priority: Low)

1-min Window
12:00 - 12:01

10-min Window
12:00 - 12:10

6. Code Example

The following code implements a custom PriorityBasedWindowFn that assigns elements to 1-minute windows if they are flagged as "high" priority, or 10-minute windows otherwise:

python
import apache_beam as beam
from apache_beam.transforms.window import WindowFn, IntervalWindow

class PriorityBasedWindowFn(WindowFn):
    def assign(self, assign_context):
        # Access the element payload and metadata timestamp
        element = assign_context.element
        timestamp = float(assign_context.timestamp)
        
        # Determine window size based on payload property
        if isinstance(element, dict) and element.get("priority") == "high":
            size = 60 # 1 minute
        else:
            size = 600 # 10 minutes
            
        # Calculate aligned start time
        start = timestamp - (timestamp % size)
        
        # Return a list of windows (must be iterable)
        return [IntervalWindow(start, start + size)]

    def get_window_coder(self):
        # Use standard interval window serializer
        return IntervalWindow.Coder()

# Usage in pipeline:
# windowed = stream | beam.WindowInto(PriorityBasedWindowFn())

7. Code Explanation

  • PriorityBasedWindowFn inherits from WindowFn.
  • assign_context.element extracts the actual Python dictionary payload.
  • assign_context.timestamp fetches the metadata event timestamp.
  • start = timestamp - (timestamp % size) aligns the window start to the nearest grid interval.
  • return [IntervalWindow(start, start + size)] returns a list containing a single IntervalWindow covering the range.
  • get_window_coder returns IntervalWindow.Coder(), which is sufficient since we are using standard IntervalWindow bounds.

8. Real Production Example

In industrial manufacturing, plants run three daily shifts that do not align with standard clock divisions: Morning (06:00 - 14:00), Afternoon (14:00 - 22:00), and Night (22:00 - 06:00). To track machinery status per shift, a custom windowing strategy calculates shift-specific intervals based on the timestamp of the telemetry reading. This allows aggregations to run precisely within shift boundaries, letting operators analyze team efficiency.

9. Common Mistakes

  • Returning a Single Window Directly: Returning IntervalWindow(...) instead of wrapping it in a list [IntervalWindow(...)]. The assign method must return an iterable collection of windows.
  • Non-Deterministic Assignment: Writing assignment logic that relies on local worker system time or randomized states. Window assignment must be a pure function of the element and its timestamp to ensure consistency.
  • Forgetting the Window Coder: Omitting get_window_coder. This causes serialization errors when elements are shuffled between worker nodes in a distributed environment.

10. Interview Perspective

  • Question: Why does the assign method return a list of windows instead of a single window?
  • Answer: In Apache Beam, a single element can be assigned to multiple windows simultaneously (e.g., in a sliding window strategy, an element belongs to multiple overlapping slices).
  • Question: What is the purpose of the merge method in a custom WindowFn?
  • Answer: The merge method allows the runner to group and combine overlapping active windows into a single window. It is used to implement session windows, where new elements bridge the gaps between existing windows.

11. Best Practices

  • Ensure the custom assign logic is highly optimized; it executes for every record passing through the transform and can easily bottleneck a high-throughput pipeline.
  • Whenever possible, reuse the standard IntervalWindow to avoid implementing a custom window class and coder.
  • Test your custom window functions with unit tests using mock timestamps to verify boundary conditions.

12. Summary

  • Custom windows allow arbitrary partitioning rules beyond standard types.
  • They are created by extending WindowFn and overriding assign.
  • assign receives element and time metadata, returning a list of windows.
  • get_window_coder is required to handle network serialization.
  • Useful for dynamic window sizing and shift-aligned window intervals.

13. Interactive Challenges

Challenge 1: Basic Shift-Based Windows (Beginner)

Complete the assign method of a custom WindowFn that assigns all elements to 12-hour windows aligned to noon (12:00) and midnight (00:00) UTC.

Challenge 2: Payload-Dependent Window Sizer (Intermediate)

Implement a custom WindowFn called DynamicSizeWindowFn that inspects assign_context.element for a key "window_duration". Use that duration (in seconds) to size the window. If the key is missing, default to 60 seconds.

Challenge 3: Duplicate Element Assigner (Advanced)

Create a custom WindowFn that assigns every element to two separate windows: a 1-minute window starting at the element's aligned timestamp, and a second 1-minute window starting exactly 1 minute later (creating an artificial delay window).

14. Related Content

Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)