intermediate
Pipeline Design Patterns
5 min readLast updated: 2026-07-02
1. Introduction
Pipeline Design Patterns define the structural topologies of Apache Beam pipelines. This includes split topologies (Fan-out), merge topologies (Fan-in), and linear chain topologies.
2. Why This Concept Exists
Complex pipelines rarely follow a simple linear sequence of reading and writing. High-level design patterns dictate how data is branched, transformed independently, and rejoined to form cohesive analytical dashboards.
3. Code Example
Implementing a split (Fan-out) and merge (Fan-in) pipeline topology:
python
import apache_beam as beam
with beam.Pipeline() as p:
# 1. Source (Fan-out branch point)
raw_nums = p | beam.Create([1, 2, 3, 4, 5])
# 2. Branch A: Squares
squares = raw_nums | "Square" >> beam.Map(lambda x: x * x)
# 3. Branch B: Cubes
cubes = raw_nums | "Cube" >> beam.Map(lambda x: x * x * x)
# 4. Fan-in (Merge branches)
merged = (squares, cubes) | "Merge" >> beam.Flatten()
merged | beam.Map(print)
4. Key Takeaways
- Fan-out: Apply multiple independent PTransforms to the same parent PCollection.
- Fan-in: Use
beam.Flattento merge multiple PCollections of the same type back into a single stream.
Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)