Streaming LabHard

Lab: Fraud Detection

Estimated time: 50 mins

Who This Lab Is For

Advanced developers looking to design complex event detection, sliding window triggers, and threshold alarms.

What You Will Learn

  • How to track event frequency over sliding time frames.
  • How to group transaction streams and count card triggers.
  • How to flag cards that exceed threshold limits and route alerts.

1. Business Scenario

Detect rapid successive banking transactions to flag potential fraud.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
timestamp,card_id,amount,location
1719830400,card1,50.00,NY
1719830402,card1,200.00,NY
1719830410,card2,10.00,CA
1719830415,card1,800.00,London
1719830440,card2,500.00,CA

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Fraud Detection
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Apply sliding windows (10s duration, 5s slide)
        # TODO: Count transactions per card ID
        # TODO: Filter counts > 2
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Group card transactions within 10-second sliding windows.
  • Count transaction occurrences per credit card.
  • Filter and flag cards that exceed 2 transactions within a single window.

5. Step-by-Step Guide & Solution

Solution for Fraud Detection

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)