Foundation LabEasy

Lab: Student Records

Estimated time: 15 mins

Who This Lab Is For

Beginner Data Engineers practicing key-value mapping and aggregating metrics grouped by partition keys.

What You Will Learn

How to map unstructured records into key-value tuples.
How to aggregate elements per key using CombinePerKey.
How to format custom structured text output.

1. Business Scenario

Parse student grades and compute the average score per subject.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text

student_id,subject,score
101,Math,95
102,Math,88
101,Science,90
103,Math,75
102,Science,92

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python

# starter.py - Student Records
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Read student dataset
        # TODO: Map scores to (subject, score) tuples
        # TODO: Calculate average score per subject
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

Parse student grade records correctly from CSV format.
Map student scores to subject key-value pairs (subject, score).
Compute the average score for each subject using a Combine transform.

5. Step-by-Step Guide & Solution

Solution for Student Records

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

AdSense Slot #847392Leaderboard Banner (728x90)