Last Minute RevisionEvergreen

Cheatsheet: Pub/Sub IO

Revision time: 3 mins

Topic Overview

Integrate with serverless Google Cloud Pub/Sub for messaging.

Syntax Snapshot

python
import apache_beam as beam

# 1. Read from Pub/Sub topic/subscription
messages = p | "ReadPubSub" >> beam.io.ReadFromPubSub(
    subscription="projects/my-project/subscriptions/my-sub"
)

# 2. Write to Pub/Sub topic
messages | "WritePubSub" >> beam.io.WriteToPubSub(
    topic="projects/my-project/topics/my-topic"
)

Key Points

  • Serverless messaging source/sink for streaming pipelines.
  • Automatically extracts message publish time as the element event-time.
  • Can configure custom timestamp attributes using timestamp_attribute.
  • Supports reading attributes and payload byte blobs.

Production Recommendations

Developer Checklist
Use subscription-based reads instead of topic-based reads in production to support checkpointing and prevent data loss.
Advertisement
AdSense Slot #556677Leaderboard Banner (728x90)