Advanced LabHard

Lab: Pub/Sub to BigQuery

Estimated time: 60 mins

Who This Lab Is For

Advanced developers designing serverless data ingestion architectures in Google Cloud Platform.

What You Will Learn

  • How to ingest messages from Google Cloud Pub/Sub subscriptions.
  • How to implement validation code and route failures to a Dead-Letter Queue (DLQ).
  • How to write to BigQuery using the high-performance Storage Write API.

1. Business Scenario

Design a streaming ingestion pipeline from Google Cloud Pub/Sub to Google BigQuery.

2. Input Dataset (\`dataset.csv\`)

Save the following raw rows locally as \`dataset.csv\` to test your pipeline:

text
subscription: "projects/corp/subscriptions/clicks-sub"
payload: {"click_id": "c982", "user_id": "u43", "url": "/pricing", "ts": 1719830400}

3. Starter Code Skeleton

Create a local file named \`starter.py\` and copy the following skeleton. Complete the missing transformations:

python
# starter.py - Pub/Sub to BigQuery
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

def run_pipeline():
    options = PipelineOptions()
    with beam.Pipeline(options=options) as p:
        # TODO: Read from Pub/Sub
        # TODO: Parse schema and handle errors
        # TODO: Write to BigQuery (Storage Write API)
        pass

if __name__ == "__main__":
    run_pipeline()

4. Lab Requirements

  • Stream message payloads from a Cloud Pub/Sub subscription.
  • Parse and validate JSON schemes.
  • Output valid records to BigQuery using Storage Write API, routing failed rows to a Dead-Letter Queue (DLQ).

5. Step-by-Step Guide & Solution

Solution for Pub/Sub to BigQuery

Click below to reveal the complete, runnable Python SDK implementation solution and the step-by-step walkthrough to complete the lab.

Advertisement
AdSense Slot #847392Leaderboard Banner (728x90)