intermediate

Pipeline Design Patterns

5 min readLast updated: 2026-07-02

1. Introduction

Pipeline Design Patterns define the structural topologies of Apache Beam pipelines. This includes split topologies (Fan-out), merge topologies (Fan-in), and linear chain topologies.

2. Why This Concept Exists

Complex pipelines rarely follow a simple linear sequence of reading and writing. High-level design patterns dictate how data is branched, transformed independently, and rejoined to form cohesive analytical dashboards.

3. Code Example

Implementing a split (Fan-out) and merge (Fan-in) pipeline topology:

python
import apache_beam as beam

with beam.Pipeline() as p:
    # 1. Source (Fan-out branch point)
    raw_nums = p | beam.Create([1, 2, 3, 4, 5])

    # 2. Branch A: Squares
    squares = raw_nums | "Square" >> beam.Map(lambda x: x * x)

    # 3. Branch B: Cubes
    cubes = raw_nums | "Cube" >> beam.Map(lambda x: x * x * x)

    # 4. Fan-in (Merge branches)
    merged = (squares, cubes) | "Merge" >> beam.Flatten()
    merged | beam.Map(print)

4. Key Takeaways

  • Fan-out: Apply multiple independent PTransforms to the same parent PCollection.
  • Fan-in: Use beam.Flatten to merge multiple PCollections of the same type back into a single stream.
Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)