Advanced LabHard

Lab: Session Analytics

Estimated time: 60 mins

Who This Lab Is For

Advanced developers seeking to model user behavior gaps and session-based window merges.

What You Will Learn

  • How to apply gap-based Session Windows in streaming pipelines.
  • How Session Windows dynamically merge key events over time.
  • How to count session-specific metrics and trace active windows.

1. Business Scenario

Group website click logs into user session windows to measure engagement.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
timestamp,user_id,action
1719830400,userA,login
1719830420,userA,view_item
1719830450,userA,logout
1719830700,userB,login
1719830900,userA,login

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Session Analytics
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Setup session windows with 300s inactivity gap
        # TODO: Count activities per session
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Map user activities to unique session user IDs.
  • Apply Session Windows with a 5-minute (300 seconds) inactivity gap threshold.
  • Count user interactions inside each session window.

5. Step-by-Step Guide & Solution

Solution for Session Analytics

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)