Skip to main content

Saga pattern

What is Saga Pattern?

A saga is a pattern for managing a sequence of distributed operations that must all succeed or be rolled back (logically compensated) — without relying on a single distributed ACID transaction.

It's basically "ACID via orchestration and compensation."

A saga lets us coordinate multi-step operations like cross-region transfers without a distributed transaction. Each step is atomic locally, and we guarantee consistency via compensating actions. For example, the EU ledger reserves funds, the US ledger credits, and the EU ledger commits or reverses based on the outcome. We implement this as choreographed events on Kafka, each idempotent and traceable, so failures only trigger compensations — never double-spends.

Example: Cross-region money transfer

Say Tenant A's home region is EU and Tenant B's is US. A user in EU sends €100 to B's account in US. We can't do one atomic DB transaction across two Postgres primaries, so we use a saga.

Steps (simplified)

  1. Reserve (debit hold) on source region
    • EU ledger posts a "reserved −€100" entry (not yet finalized).
    • Status: RESERVED.
  2. Publish event TransferReserved (or TransferInitiated).
  3. Execute credit step on target region
    • US ledger receives the event, posts "credit +€100", emits TransferCredited.
  4. Finalize / confirm
    • EU ledger sees TransferCredited → marks reservation as COMMITTED.
    • Emits TransferPosted (completed).
  5. If any step fails, trigger a compensating transaction:
    • US credit fails → EU ledger reverses the hold (REVERSED entry, +€100).
    • Emit TransferReversed.

Each step is atomic locally, and the system reaches global consistency through asynchronous events and compensations.

Coordination models

  1. Orchestration Saga

    • A central "Transfer Orchestrator" service drives the flow:
    • calls or publishes to EU → waits → calls/publishes to US → monitors success/failure.
    • Simpler to reason about.
    • Orchestrator holds the state machine (PENDING → RESERVED → COMMITTED/REVERSED).
  2. Choreography Saga

    • No central controller; each service reacts to events:
    • EU ledger emits TransferReserved.
    • US ledger listens, performs credit, emits TransferCredited.
    • EU ledger listens, commits reservation, emits TransferPosted.
    • Lower coupling, but more distributed complexity and potential race handling.

In fintech systems, choreography is often favored because you already have Kafka/event streams in place.

Implementation mechanics

  • Each step has:
    • a command handler (performs local write, emits next event),
    • and a compensation handler (undoes local effect if needed).
  • Keep a saga state table per region:
CREATE TABLE cross_region_sagas (
saga_id UUID PRIMARY KEY,
source_region TEXT,
target_region TEXT,
transfer_id UUID,
state TEXT, -- INITIATED|RESERVED|CREDITED|COMMITTED|REVERSED|FAILED
updated_at TIMESTAMPTZ
);
  • Transitions are driven by events and are idempotent (same event twice = no change).

Safety properties

PropertyAchieved how
No double-spendEach ledger enforces uniqueness per transfer_id.
No lossEvents are durable in Kafka; compensations are retried.
Eventually consistentBalances converge once all compensations settle.
AuditableBoth ledgers log full journal + saga transitions.

Failure handling

  • Crash mid-flow: orchestration resumes from saga state (RESERVED etc.).
  • Message lost: consumer reprocesses (idempotent).
  • Target region down: EU ledger retries, or after timeout triggers reversal.
  • Partial success: compensation restores balance (mirror of step 1).

Observability

  • Trace each saga with a global saga_id / trace_id.
  • Metrics: sagas_in_progress, compensations_executed, avg_completion_time, saga_timeout_rate.