Foundation LabEasy

Lab: Student Records

Estimated time: 15 mins

Who This Lab Is For

Beginner Data Engineers practicing key-value mapping and aggregating metrics grouped by partition keys.

What You Will Learn

  • How to map unstructured records into key-value tuples.
  • How to aggregate elements per key using CombinePerKey.
  • How to format custom structured text output.

1. Business Scenario

Parse student grades and compute the average score per subject.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
student_id,subject,score
101,Math,95
102,Math,88
101,Science,90
103,Math,75
102,Science,92

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Student Records
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Read student dataset
        # TODO: Map scores to (subject, score) tuples
        # TODO: Calculate average score per subject
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Parse student grade records correctly from CSV format.
  • Map student scores to subject key-value pairs (subject, score).
  • Compute the average score for each subject using a Combine transform.

5. Step-by-Step Guide & Solution

Solution for Student Records

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)