Advanced developers studying user activity tracking, streaming actions, and fixed window counts.
Count real-time webpage pageviews using streaming windows.
Save the following raw rows locally as \`dataset.csv\` to test your pipeline:
timestamp,user_id,page_url,action
1719830400,u1,/home,view
1719830405,u2,/products,view
1719830412,u1,/checkout,click
1719830418,u3,/home,view
1719830430,u2,/cart,clickCreate a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:
# starter.py - Website Clickstream
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
def run_pipeline():
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
# TODO: Filter pageviews
# TODO: Apply 1-minute fixed windowing
# TODO: Count occurrences per URL
pass
if __name__ == "__main__":
run_pipeline()