Session Windows
1. Introduction
A Session Window is a windowing strategy that groups elements based on periods of activity. It is defined by an Inactivity Gap (timeout duration). A new session window starts when an event arrives, and it closes if no new events are received within the inactivity gap.
2. Why This Concept Exists
Fixed and sliding windows are bound to static calendar intervals (e.g. every hour). However, human behavior is irregular. For example, a user visits an e-commerce store, browses items for 15 minutes, and leaves. You want to analyze their behavior as a single "session." Session windows allow you to capture these irregular bursts of activity, grouping events by inactivity breaks rather than clock boundaries.
3. Key Terminology
- Inactivity Gap: The maximum duration of silence allowed before a session is declared closed (e.g., 30 minutes).
- Dynamic / Key-scoped: Unlike fixed windows, session windows are unique to each key. User-A and User-B can have completely different session start and end times.
- Window Merging: The process where adjacent session windows are combined together if a new event arrives that bridges the gap between them.
4. How It Works
- Initial assignment: When an event arrives, Beam assigns it to a temporary session window starting at the event's timestamp and ending at
timestamp + inactivity_gap. - Evaluate overlaps: Beam checks if this new temporary window overlaps with any existing session windows for that specific key.
- Merge: If they overlap, Beam merges the windows into a single, larger session window.
- Repeat: The session window expands continuously as long as elements arrive before the inactivity gap expires.
5. Visual Diagram
6. Code Example
Grouping web clicks into session windows with a 30-minute inactivity gap:
import apache_beam as beam
from apache_beam.transforms.window import Sessions
with beam.Pipeline() as p:
(p
| "ReadClicks" >> beam.io.ReadFromPubSub(subscription="projects/my-gcp/subs/clicks")
| "FormatKV" >> beam.Map(lambda x: (x["user_id"], x["page_url"]))
| "ApplySessions" >> beam.WindowInto(Sessions(gap_size=30 * 60)) # 30-minute gap
| "GroupSessionPages" >> beam.GroupByKey()
| "LogSessions" >> beam.Map(print))
7. Code Explanation
Sessions(gap_size=30 * 60)defines a 30-minute inactivity gap (1800 seconds).GroupByKey()groups the urls per user within each session.- If User-1 clicks at
12:00and12:15, they are merged into one session12:00 - 12:45. - If User-1 clicks again at
13:00, it starts a separate second session.
8. Real Production Example
In online gaming analytics, session windows are used to track player matches. You set a session gap of 15 minutes. When a player logs in and plays 5 matches in a row with less than 15 minutes between matches, all game statistics are grouped as a single gaming session for metrics aggregation.
9. Common Mistakes
- Running without keys: Session windows require key-value tuples to work. Merging occurs independently per key. Running sessions on flat collections will throw an execution exception.
- Setting an excessively large gap size: If you set the gap size too large (e.g. 24 hours), sessions will rarely merge or close, causing workers to hold window state in memory indefinitely.
10. Interview Perspective
- Question: Explain how session windows are merged?
- Answer: Each element initially gets a window
[Timestamp, Timestamp + Gap]. If elements for a key arrive closer than the gap duration, their temporary windows overlap. The runner merges these overlapping windows into a single window spanning[First_Timestamp, Last_Timestamp + Gap]. - Question: Why are session windows considered "merging windows"?
- Answer: Because unlike fixed windows which have static boundaries, session window boundaries are dynamic and change (merge) as new data points arrive.
11. Best Practices
- Clean up session resources using a reasonable gap size.
- Set an allowed lateness policy to handle out-of-order logs that might merge past session intervals.
12. Summary
- Session windows are based on periods of activity separated by inactivity gaps.
- Dynamic and scoped per key.
- Merge adjacent windows automatically as elements arrive.