Intermediate developers looking to learn department-based statistics and global maximum value tracking.
Compute average salary by department and identify the highest earner.
Save the following raw rows locally as \`dataset.csv\` to test your pipeline:
emp_id,dept,salary
e1,HR,60000
e2,Engineering,140000
e3,Engineering,110000
e4,Sales,80000
e5,HR,65000Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:
# starter.py - HR Analytics
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
def run_pipeline():
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
# TODO: Compute average salary per department
# TODO: Identify highest overall salary
pass
if __name__ == "__main__":
run_pipeline()