Triggers
1. Introduction
A Trigger determines exactly when a window's aggregated results are emitted (fired) during stream processing. While windowing groups elements by time, triggers decide when to output the results of those groupings.
2. Why This Concept Exists
By default, Apache Beam waits until the watermark passes the end of a window before emitting the results. However, in real-world scenarios, you might need:
- Early results: Seeing a hourly total update every 5 minutes (before the hour finishes).
- Late updates: Re-calculating the window's total if delayed data arrives. Triggers allow you to configure exactly when these outputs (panes) fire.
3. Key Terminology
- Pane: An individual output emission from a window. A single window can fire multiple panes (e.g., early, on-time, and late panes).
- Accumulating Mode: Emitted panes contain all data processed in the window so far.
- Retracting Mode: Emitted panes contain only the difference (delta) from the previous emission, or output a retraction of the old value.
4. Trigger Categories
- Event Time Triggers: Fire based on watermark progress (e.g.
AfterWatermark). - Processing Time Triggers: Fire based on system wall-clock time (e.g.,
AfterProcessingTime). - Element Count Triggers: Fire after a specific number of records arrive (e.g.
AfterCount). - Composite Triggers: Combine multiple trigger strategies using logic gates (e.g.
Repeatedly,AfterFirst,AfterAll).
5. Visual Diagram
Pane 1 (Early)
Triggered by count
Pane 2 (Early)
Triggered by count
Pane 3 (On-Time)
Watermark passes end
Pane 4 (Late)
Triggered by late event
6. Code Example
Configuring a trigger that fires every 10 items or when the watermark passes:
import apache_beam as beam
from apache_beam.transforms.trigger import AfterWatermark, AfterCount, Repeatedly
from apache_beam.transforms.window import FixedWindows
with beam.Pipeline() as p:
(p
| "Read" >> beam.io.ReadFromPubSub(subscription="projects/my-project/subs/my-sub")
| "Window" >> beam.WindowInto(
FixedWindows(60 * 60), # 1 hour
trigger=Repeatedly(AfterFirst(AfterWatermark(), AfterCount(10))),
accumulation_mode=beam.transforms.trigger.AccumulationMode.ACCUMULATING
))
7. Code Explanation
FixedWindows(60 * 60)groups elements into hourly intervals.trigger=Repeatedly(...)allows the window to fire multiple times.AfterFirst(...)fires a pane as soon as either sub-trigger is met:AfterWatermark(): The watermark passes the hour.AfterCount(10): 10 new elements have arrived.
AccumulationMode.ACCUMULATINGtells Beam to include previous values in subsequent updates.
8. Real Production Example
In live web traffic dashboards, you use processing-time triggers to refresh chart stats every 10 seconds:
trigger=Repeatedly(AfterProcessingTime(10))
This ensures the dashboard graphs look alive and update continuously, rather than updating once per hour.
9. Common Mistakes
- Forgetting to set accumulation mode: If you define a custom trigger, you must explicitly specify
accumulation_mode. Omitting it will throw a configuration exception. - Infinite loops with Count Triggers: If you set
AfterCount(1)without wrapping it inRepeatedly, the window will fire once on the first element and never fire again.
10. Interview Perspective
- Question: What is the difference between Accumulating and Retracting mode?
- Answer: In Accumulating mode, subsequent window firings output the total running sum. In Retracting mode, subsequent firings output only the difference (delta) or issue a retraction of the previous value to let downstream databases adjust their counts.
- Question: When does a window state get deleted?
- Answer: The window state is held until the watermark passes the end of the window plus the
allowed_latenessduration. After this limit, the state is cleared.
11. Best Practices
- Use
Accumulatingmode when writing to databases that support overwrites/upserts. - Avoid using low element count triggers (like
AfterCount(1)) on massive streams, as this creates excessive network noise and disk load.
12. Summary
- Triggers control when window results are emitted.
- Supports Event-Time, Processing-Time, and Element-Count conditions.
- Must be configured with either Accumulating or Retracting pane modes.