Streaming LabHard

Lab: Website Clickstream

Estimated time: 45 mins

Who This Lab Is For

Advanced developers studying user activity tracking, streaming actions, and fixed window counts.

What You Will Learn

  • How to assign event timestamps to web click activities.
  • How to apply filters on stream actions before window processing.
  • How to group user interactions per page URL in 60-second fixed windows.

1. Business Scenario

Count real-time webpage pageviews using streaming windows.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
timestamp,user_id,page_url,action
1719830400,u1,/home,view
1719830405,u2,/products,view
1719830412,u1,/checkout,click
1719830418,u3,/home,view
1719830430,u2,/cart,click

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Website Clickstream
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Filter pageviews
        # TODO: Apply 1-minute fixed windowing
        # TODO: Count occurrences per URL
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Assign event timestamps to click logs.
  • Filter logs to process only pageview actions (action == 'view').
  • Apply 60-second Fixed Windows and count pageviews per URL.

5. Step-by-Step Guide & Solution

Solution for Website Clickstream

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)