Beginner Data Engineers practicing key-value mapping and aggregating metrics grouped by partition keys.
Parse student grades and compute the average score per subject.
Save the following raw rows locally as \`dataset.csv\` to test your pipeline:
student_id,subject,score
101,Math,95
102,Math,88
101,Science,90
103,Math,75
102,Science,92Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:
# starter.py - Student Records
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
def run_pipeline():
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
# TODO: Read student dataset
# TODO: Map scores to (subject, score) tuples
# TODO: Calculate average score per subject
pass
if __name__ == "__main__":
run_pipeline()