Beginner Data Engineers looking to learn the basic flow of Apache Beam pipelines, reading text files, and computing simple metrics.
Learn basic transformations by filtering and aggregating employee records.
Save the following raw rows locally as \`dataset.csv\` to test your pipeline:
id,name,department,salary,active
1,Alice,Engineering,120000,true
2,Bob,Marketing,90000,true
3,Charlie,Engineering,110000,false
4,David,Sales,85000,true
5,Eve,Engineering,130000,trueCreate a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:
# starter.py - Employee Analytics
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
def run_pipeline():
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
# TODO: Read from 'dataset.csv'
# TODO: Filter out inactive employees
# TODO: Calculate average salary of active employees
# TODO: Write to 'output.txt'
pass
if __name__ == "__main__":
run_pipeline()