Appearance
BackfillManager
Detects price data gaps and fills them via cross-location RPC and external data sources.
High-Level Design
BackfillManager solves a fundamental problem in a distributed price collection system: any single collector will inevitably miss data. Network hiccups, DO evictions, and API failures all create gaps in the second-level price timeline. The manager continuously scans for these gaps and recovers data from redundant sources.
Two-Phase Architecture
The manager operates in two distinct phases, both driven by the PriceCollector alarm loop (every 1 second):
Phase 1: Gap Detection (synchronous, every alarm tick)
cursor ──────────────────────────────── now - 1s
│ │
│ scan each second: is price data │
│ present in PriceStorageManager? │
│ │
│ missing? → INSERT into backfill_gaps │
│ present? → skip │
│ │
└── advance cursor ─────────────────────→│
Phase 2: Gap Recovery (async, non-blocking)
Oldest 10 pending gaps (batch)
│
├─ Try each geo location in priority order ──────┐
│ Single batched RPC: getPriceData([ts1..tsN]) │
│ Remove filled timestamps from batch │
│ │
├─ Remaining gaps → External source (Pyth) ─────┤
│ One request per timestamp │
│ │
└─ Still missing → increment attempt count │
After 5 attempts → mark permanently failed │Phase 2 is fire-and-forget from the alarm loop (void processBackfillQueue()) and protected by a mutex to prevent overlapping runs.
Cross-Location Fallback Chain
Each PriceCollector instance runs in a specific Cloudflare region. When it has gaps, it asks other regions for data in geographic priority order (nearest first):
wnam → [sam, enam, afr, weur, apac, me, eeur, oc]
apac → [oc, weur, eeur, me, afr, wnam, enam, sam]
...etc (see geo-fallbacks.ts)The key optimization is batched RPC: all pending timestamps are sent in a single getPriceData() call per location. As each location fills some timestamps, only the remaining ones are forwarded to the next location. This minimizes cross-region round trips.
Finalization Concept
BackfillManager defines when a timestamp is considered finalized — meaning all recovery attempts have been made and the data (or lack thereof) is authoritative. This is consumed by PriceStorageManager to decide whether interpolation is safe.
A timestamp is finalized when:
- Gap checking has been initialized (at least one price update received)
- The timestamp is older than the gap-checking cursor
- There are no pending gaps older than the timestamp
finalized not yet scanned
◄──────────────────────────────►◄──────────────────────────────►
─────────┬────────────────┬─────┬──────────────────────────────→ time
│ │ │
oldest pending gap cursor now
gap (if any)PriceStorageManager uses getFinalizedUpTo() (which combines cursor position and oldest pending gap) to determine if missing data at a timestamp should be interpolated from neighbors or left as unavailable.
Data Validation
Backfilled data must pass validation before being accepted:
- At least 13 symbols present (50% of 26 tracked assets)
- All prices are positive numbers
Invalid data is silently skipped — the timestamp stays pending for retry.
External Source Fallback (Pyth Network)
When all cross-location sources fail, the manager falls back to ExternalDataSourceManager which fetches from Pyth Network. Two special behaviors:
- Minimum age gate: Timestamps less than 10 seconds old are skipped without counting as an attempt (Pyth needs time to finalize prices)
- Individual requests: Unlike batched cross-location RPC, external source fetches are per-timestamp
Persistence & Eviction Resilience
When backfill recovers data for a timestamp, it writes in a specific order:
- Update in-memory cache (immediate availability)
- Persist to SQLite (survives DO eviction)
- Notify CandleAggregationManager (may complete a candle period)
- Mark gap as filled
If the DO evicts mid-sequence, the worst case is a redundant backfill on restart — the gap table survives in SQLite and the process resumes. Data is never lost once step 2 completes.
Development Mode Behavior
- Gap detection window limited to last 10 seconds (avoids noise from restarts)
- Cross-location backfill disabled (empty fallback chain)
- External sources still available
State & Storage
In-Memory State
| Property | Type | Purpose |
|---|---|---|
minTimestampToCheckForGaps | number | Gap-checking cursor, advances each cycle |
hasStartedGapChecking | boolean | Guards against recording gaps before first data arrives |
isBackfillRunning | boolean | Mutex for processBackfillQueue |
SQLite Table: backfill_gaps
| Column | Type | Purpose |
|---|---|---|
timestamp | INTEGER (PK) | The missing second-level timestamp |
first_detected_at | INTEGER | When the gap was first detected |
status | TEXT | pending → filled or failed |
attempt_count | INTEGER | Recovery attempts made (max 5) |
last_attempt_at | INTEGER | Most recent attempt timestamp |
source_location_hint | TEXT | Location that provided the fill (or EXTERNAL) |
error_message | TEXT | Last error on failure |
Gap recording is idempotent (INSERT OR IGNORE).
See Also
- ExternalDataSourceManager — Pyth Network fallback details
- PriceStorageManager — Primary storage and finalization logic
- CandleAggregationManager — Consumes backfilled bucket data