intermediate
Reusable Transforms
5 min readLast updated: 2026-07-02
1. Introduction
A Reusable Transform in Apache Beam is a custom composite transform that inherits from beam.PTransform. It allows developers to bundle a sequence of transforms into a single, parameterizable, and shareable package.
2. Why This Concept Exists
As pipelines grow, duplicate patterns appear (e.g. reading logs, extracting fields, and normalizing timestamps). Wrapping this logic inside custom PTransform classes ensures DRY (Don't Repeat Yourself) code, simplifies unit testing, and promotes modularity across engineering teams.
3. Code Example
Creating a parameterized reusable text-cleansing transform:
python
import apache_beam as beam
class CleanAndExtractFields(beam.PTransform):
def __init__(self, delimiter=",", field_index=0):
super().__init__()
self.delimiter = delimiter
self.field_index = field_index
def expand(self, pcoll):
return (pcoll
| "SplitLine" >> beam.Map(lambda x: x.split(self.delimiter))
| "GetField" >> beam.Map(lambda parts: parts[self.field_index].strip().lower()))
# Usage in a pipeline
with beam.Pipeline() as p:
(p
| beam.Create(["User1, Active", " User2 , Inactive"])
| "CleanUserIds" >> CleanAndExtractFields(delimiter=",", field_index=0)
| beam.Map(print))
4. Key Takeaways
- Custom transforms override the
expand(self, input_pcollection)method. - Reusable transforms can receive parameters in their constructor to generalize behaviors.
Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)