Batch LabMedium

Lab: HR Analytics

Estimated time: 30 mins

Who This Lab Is For

Intermediate developers looking to learn department-based statistics and global maximum value tracking.

What You Will Learn

  • How to calculate mean statistics per key using MeanCombineFn.
  • How to implement custom comparison logic to query global maximum values.
  • How to split computations into separate parallel branches from a single source.

1. Business Scenario

Compute average salary by department and identify the highest earner.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
emp_id,dept,salary
e1,HR,60000
e2,Engineering,140000
e3,Engineering,110000
e4,Sales,80000
e5,HR,65000

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - HR Analytics
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Compute average salary per department
        # TODO: Identify highest overall salary
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Parse department and salary attributes.
  • Calculate the average salary per department.
  • Find the employee with the highest salary globally using Max combinators.

5. Step-by-Step Guide & Solution

Solution for HR Analytics

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)