advanced

Accumulation Modes

7 min readLast updated: 2026-07-01

1. Introduction

When a windowing strategy is combined with custom triggers or allowed lateness, a single window can fire multiple times. Each emission is called a pane.

Accumulation Modes define how Apache Beam handles the data from previous firings when a window emits subsequent panes. Beam offers two primary modes: Accumulating (which keeps previous results and adds to them) and Retracting (which retracts previous emissions and sends updated values).

2. Why This Concept Exists

If a window triggers multiple times, downstream systems need a clear strategy to interpret the updates.

Choosing the correct accumulation mode is essential for:

  • Preventing Double-Counting: If you sum transactions in a window, and a late event arrives, you need to ensure the downstream database updates the total rather than adding the new total to the old total.
  • Downstream Agnostic Pipelines: Ensuring that subsequent aggregations in the pipeline (like a secondary GroupByKey) calculate correct results even when their inputs change over time.
  • Idempotency: Aligning pipeline outputs with the capabilities of your storage layer (e.g., upserting into Elasticsearch vs. appending to a CSV file).

3. Key Terminology

  • Accumulating Mode: Subsequent panes contain all the elements that have arrived in the window so far, including elements from previous panes.
  • Retracting Mode: Subsequent panes emit both a "retraction" (a subtraction of the previous pane's value) and the new aggregated value.
  • Pane: A single output materialization from a window. A window's first pane is index 0, followed by late panes 1, 2, ....

4. How It Works

Let's trace a window that receives elements [2, 3] in its first firing, and later receives late element [4]:

  • First Firing: The sum of [2, 3] is 5.
  • Late Event Arrives: [4] is added to the window.
  • Second Firing:
    • Accumulating Mode: The runner emits the new cumulative sum: 9. The downstream system must replace the previous value 5 with 9 (e.g., using a database overwrite).
    • Retracting Mode: The runner emits two signals: a retraction of the previous value -5 (retraction), followed by the new value 9 (addition). A downstream aggregator processes both, resulting in 5 + (-5) + 9 = 9.

5. Visual Diagram

Accumulating Mode
Keeps previous window states:
Pane 0 (On-Time): Sum = 5
Pane 1 (Late): Sum = 9 (5 + 4)
Retracting Mode
Retracts old panes before emitting updates:
Pane 0 (On-Time): Sum = 5
Pane 1 (Late): Retract = -5, Sum = 9

6. Code Example

The following code compares how to configure Accumulating and Retracting modes within WindowInto:

python
import apache_beam as beam
from apache_beam.transforms.trigger import AccumulationMode, Repeatedly, AfterCount
from apache_beam.transforms.window import FixedWindows

# 1. Accumulating Mode Pipeline
with beam.Pipeline() as p1:
    (p1
     | "ReadAccumulating" >> beam.Create([("user1", 10), ("user1", 20)])
     | "WindowAccumulate" >> beam.WindowInto(
         FixedWindows(60),
         trigger=Repeatedly(AfterCount(1)),
         accumulation_mode=AccumulationMode.ACCUMULATING
     )
     | "SumAccumulate" >> beam.CombinePerKey(sum)
     | "LogAccumulating" >> beam.Map(lambda x: print(f"Accumulating Pane Output: {x}"))
    )

# 2. Retracting Mode Pipeline
with beam.Pipeline() as p2:
    (p2
     | "ReadRetracting" >> beam.Create([("user1", 10), ("user1", 20)])
     | "WindowRetract" >> beam.WindowInto(
         FixedWindows(60),
         trigger=Repeatedly(AfterCount(1)),
         accumulation_mode=AccumulationMode.RETRACTING
     )
     | "SumRetract" >> beam.CombinePerKey(sum)
     | "LogRetracting" >> beam.Map(lambda x: print(f"Retracting Pane Output: {x}"))
    )

7. Code Explanation

  • AccumulationMode.ACCUMULATING tells the runner to retain window state across firings. For the first element 10, it emits 10. For the second element 20, it emits the accumulated total 30.
  • AccumulationMode.RETRACTING tells the runner to emit a retraction for the previous aggregate. For 10, it emits 10. When 20 arrives, it retracts -10 and emits 30. Downstream combiners use these retractions to update their internal states.

8. Real Production Example

In a monthly utility billing system, a pipeline aggregates electricity consumption logs per household. The billing database is append-only for audit integrity. When late logs arrive (e.g., due to smart-meter connectivity outages), using Retracting mode is mandatory. If the initial bill was calculated as $150 and late logs add $30, the pipeline emits a retraction of -$150 and an addition of $180. The billing system appends both transactions, yielding a correct net account balance of $180 without losing historical audit states.

9. Common Mistakes

  • Double-Counting in Databases: Using Accumulating Mode and writing output panes to a database table using standard SQL INSERT statements. If a window sum updates from 10 to 15, the table will hold both rows, leading to an incorrect sum of 25. Use UPSERT or key-based overwrites for Accumulating Mode.
  • Forgetting to Set Accumulation Mode: Setting a trigger in WindowInto but omitting accumulation_mode. The pipeline will crash with a compilation error.
  • High Memory Overhead in Retracting Mode: Using Retracting Mode requires the runner to keep track of every value ever emitted in order to generate retractions, which increases memory and storage utilization.

10. Interview Perspective

  • Question: Why is Retracting Mode necessary if the pipeline contains multiple grouping operations?
  • Answer: If you group by user, sum their score, and then group by country, the country-level aggregator needs to subtract the user's old score before adding the user's updated score. Retracting Mode enables this correction.
  • Question: Which database types are best suited for Accumulating Mode?
  • Answer: Key-value stores and document databases that support overwrites based on a key (e.g., Redis, Cassandra, Bigtable, Elasticsearch).

11. Best Practices

  • Use Accumulating Mode when writing directly to dashboards or key-value stores where updates are handled as overwrites.
  • Use Retracting Mode when the downstream pipeline performs secondary groupings or aggregations that must remain mathematically sound.
  • Verify that your downstream database connector supports retractions or updates before choosing your accumulation mode.

12. Summary

  • Accumulation modes govern how multi-pane window outputs are represented.
  • Accumulating Mode outputs the entire running total with each pane.
  • Retracting Mode outputs a retraction of the old value plus the new value.
  • Accumulating Mode requires key-based upserts in downstream storage.
  • Retracting Mode prevents double-counting in downstream pipeline aggregations.

13. Interactive Challenges

Challenge 1: Configure Accumulating Window (Beginner)

Configure a 10-minute fixed window that fires every 10 elements, using accumulating mode so that each pane contains all elements processed so far.

Challenge 2: Configure Retracting Window (Intermediate)

Configure a 30-minute fixed window that fires updates every 60 seconds of processing time, using retracting mode to ensure corrections are sent.

Challenge 3: Downstream Pane Inspector (Advanced)

Write a Map lambda or DoFn that inspects the pane metadata of an incoming element and prints "LATE UPDATE" if the element is part of a late pane firing.

14. Related Content

Advertisement
AdSense Slot #000001Leaderboard Banner (728x90)