Batch LabMedium

Lab: Customer Orders

Estimated time: 30 mins

Who This Lab Is For

Intermediate developers practicing transaction processing, business status filtering, and customer key aggregation.

What You Will Learn

  • How to implement multi-step filtering based on string status criteria.
  • How to group transaction data by customer keys.
  • How to calculate total revenue per customer and save CSV outputs.

1. Business Scenario

Analyze retail transactions and generate total customer spending aggregations.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
order_id,customer_id,amount,status
o1,c1,120.50,COMPLETED
o2,c2,80.00,PENDING
o3,c1,45.00,COMPLETED
o4,c3,300.00,COMPLETED
o5,c2,150.00,FAILED

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Customer Orders
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Read customer orders
        # TODO: Filter out FAILED and PENDING orders
        # TODO: Aggregate spending per customer
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Filter out any orders that do not have a status of 'COMPLETED'.
  • Map order values to customer keys (customer_id, amount).
  • Sum the total amount spent per customer and write to output.

5. Step-by-Step Guide & Solution

Solution for Customer Orders

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)