Last Minute RevisionEvergreen

Cheatsheet: Watermarks

Revision time: 4 mins

Topic Overview

Track progress and event-time completeness in streaming pipelines.

Syntax Snapshot

python
# Set custom event timestamps using timestamp assignment
timestamped_events = events | "AddEventTime" >> beam.Map(
    lambda x: beam.window.TimestampedValue(x, x.event_timestamp)
)

Key Points

  • A Watermark is the pipeline's estimate of event-time completeness.
  • A watermark at time T implies no more elements with event-time t < T are expected.
  • Heuristic (predictive lag) vs Punctuated (metadata-based marker).
  • Controls when window boundaries close and when late data is generated.

Production Recommendations

Developer Checklist
Monitor watermark delay metrics in the runner dashboard. A stalled watermark prevents windows from emitting on-time outputs.
Advertisement
AdSense Slot #556677Leaderboard Banner (728x90)