Last Minute RevisionEvergreen

Cheatsheet: BigQuery IO

Revision time: 4 mins

Topic Overview

Read and write data high-throughput at scale to Google BigQuery.

Syntax Snapshot

python
import apache_beam as beam
from apache_beam.io.gcp.bigquery import WriteToBigQuery

# Write records in storage api mode
records | "WriteBQ" >> WriteToBigQuery(
    "my-project:my_dataset.my_table",
    write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
    method=WriteToBigQuery.Method.STORAGE_WRITE_API
)

Key Points

  • Read: Native SQL queries or direct table exports.
  • Write: Streaming Inserts (immediate, expensive) vs Storage Write API (high throughput, recommended).
  • Create disposition: CREATE_IF_NEEDED vs CREATE_NEVER.
  • Write disposition: WRITE_APPEND, WRITE_TRUNCATE (batch only), WRITE_EMPTY.

Production Recommendations

Developer Checklist
Prefer the STORAGE_WRITE_API method for production streaming writes. Use side-outputs to capture failed rows (DLQ).
Advertisement
AdSense Slot #556677Leaderboard Banner (728x90)