ADR-004: Market Data Backfill Strategy โ
Metadata โ
- Date: 2025-10-20
- Status: Proposed
- Tags: backfill, gap-detection, self-healing, data-quality
- Related: ADR-002 (Storage), ADR-003 (Ingestion),
packages/market-data
Context โ
EdgeHog's ingestion pipeline (ADR-003) handles real-time data via WebSocket and REST APIs, but production deployments face inevitable gaps:
- Exchange downtime: Scheduled maintenance, API outages, rate limiting
- Network failures: Connection drops, DNS issues, transient errors
- Application restarts: Deployments, crashes, configuration changes
- Historical initialization: New trading pairs, new exchanges, backtest data needs
Without automated gap detection and backfill, these gaps corrupt backtesting results and create blind spots in live trading decisions.
Requirements Mapping โ
This ADR addresses:
- Requirement 1 (ADR-003): Self-healing ingestion via automatic gap recovery
- Requirement 2 (ADR-002): Gap-free completeness for accurate backtesting
- Requirement 3 (ADR-003): Deterministic data quality through source precedence (backfill=3, lowest priority)
Decision โ
Implement automated backfill with periodic gap detection and on-demand historical fetching.
Core Principles โ
- Detection-first: Monitor for gaps continuously; trigger backfill automatically
- Lowest precedence: Backfill data (precedence=3) never overwrites real-time feeds (precedence=1-2)
- Composable API: Expose both daemon-triggered (automatic) and CLI-triggered (manual) backfill
- Rate-limit aware: Respect exchange API limits via exponential backoff and batch size tuning
- Audit trail: Log all backfill operations with ranges, instruments, and success/failure status
Gap Detection Strategy โ
Periodic monitoring (daemon mode):
- Query
market.ingestion_gap_metricsview every 5 minutes - Trigger backfill for any instrument with
gap_rate_pct > 5%within last 24 hours - Skip gaps older than mutable window (72h) unless manually requested
On-demand detection (CLI mode):
- User-initiated via
/downloadcommand in TUI oredgehog backfillCLI - Specify instrument, date range, and optional batch size override
Backfill Workflow โ
1. Gap Detection
โ
2. Generate missing timestamp ranges (per instrument)
โ
3. Fetch via CCXT fetchOHLCV() in batches (1000 candles/request)
โ
4. Normalize timestamps via normalizeToOpenTime() (ADR-003)
โ
5. Ingest via IngestionPipeline.ingestBatch() with DataSource.Backfill (precedence=3)
โ
6. Log operation: instrument, start/end time, candles fetched, errorsAPI Design Principles โ
Predictable: Same interface for daemon and manual backfill
pipeline.backfill(instrumentId: number, start: Date, end: Date, options?: BackfillOptions)Composable: Reuses existing IngestionPipeline (ADR-003) for merge logic
await pipeline.ingestBatch(candles, DataSource.Backfill); // precedence=3 derived from sourceTestable: Mock CCXT client for unit tests; integration tests use test database
const mockExchange = { fetchOHLCV: vi.fn(() => [...]) };
await pipeline.backfill(123, start, end, { exchange: mockExchange });Configuration Defaults โ
Align with ADR-003 operational defaults:
| Parameter | Default | Override | Rationale |
|---|---|---|---|
| Batch size | 1000 candles/request | options.batchSize | Balances API limits vs. throughput |
| Mutable window | 72 hours | N/A | Inherited from ADR-003; backfill applies to older data |
| Gap threshold | 5% missing candles in 24h | options.gapThreshold | Avoids spurious backfills for minor gaps |
| Detection interval | 5 minutes (daemon) | N/A | Balances responsiveness vs. database load |
| Retry policy | Exponential backoff: 1s โ 60s max | options.retryConfig | Mitigates rate limit errors |
Alternatives Considered โ
1. No Automated Backfill (Manual Only) โ
Rejected: Operators cannot monitor all instruments 24/7; gaps accumulate unnoticed until backtests fail.
2. Backfill on Every Gap (Zero Tolerance) โ
Rejected: Over-aggressive; triggers backfill for transient 1-minute gaps that self-heal. Wastes API quota.
3. Higher Precedence for Backfill (precedence=2) โ
Rejected: Violates ADR-003 principle that real-time data (WebSocket=1, REST=2) is more accurate than historical fetches. Backfill should only fill gaps, not overwrite existing data.
4. Separate Backfill Table โ
Rejected: Adds query complexity (JOINs required); duplicates merge logic; conflicts with ADR-003 single-table design.
Consequences โ
Benefits โ
- โ Self-healing data: Gaps automatically repaired without operator intervention
- โ Backtest integrity: Historical data completeness guaranteed for strategy validation
- โ
Audit trail: Full provenance via
data_source_id=backfillandreceived_attimestamps - โ
Composability: Reuses existing
IngestionPipeline(no duplicate merge logic)
Risks โ
- โ ๏ธ API rate limits: Aggressive backfill may exhaust exchange quotas (e.g., 1200 requests/min on Bybit)
- โ ๏ธ False positives: Short-lived gaps (e.g., 1-2 minutes) may trigger unnecessary backfills
- โ ๏ธ Compressed chunks: Backfilling data >7 days old requires decompression (slow; see ADR-003)
Mitigations โ
Rate limit handling:
- Exponential backoff on HTTP 429 errors
- Configurable
batchSizeanddelayBetweenBatchesfor conservative deployments - Log rate limit errors with retryable/non-retryable classification
Gap threshold tuning:
- Default 5% threshold filters noise (1-2 minute gaps ignored)
- Configurable per instrument for high-frequency pairs
Compressed chunk handling:
- Detect compressed chunks before backfill (query
timescaledb_information.chunks) - Warn operator; require explicit
--force-decompressflag for old data - Log decompression events with performance metrics (time, rows affected)
- Detect compressed chunks before backfill (query
Monitoring:
- Track backfill success rate (target: >95%)
- Alert on repeated failures (same instrument, same time range >3 attempts)
- Dashboard widget: Gaps filled in last 24h, API quota usage
Acceptance Criteria โ
Functional Requirements โ
- [ ] Gap detection:
market.ingestion_gap_metricsidentifies gaps with >5% missing candles in 24h window - [ ] Automatic trigger: Daemon queries gap metrics every 5min; initiates backfill for flagged instruments
- [ ] Manual trigger: CLI command
edgehog backfill --instrument=123 --start=2024-01-01 --end=2024-01-31works - [ ] Precedence enforcement: Backfill data (precedence=3) does NOT overwrite WebSocket (precedence=1) or REST (precedence=2)
- [ ] Timestamp normalization: All backfilled candles use
normalizeToOpenTime()before ingestion - [ ] Merge semantics:
high=max,low=min,volume=max,close=from-highest-precedence(per ADR-003)
Non-Functional Requirements โ
- [ ] Batch efficiency: Backfill 1000 candles in <5s (network permitting)
- [ ] Rate limit resilience: Exponential backoff on HTTP 429; max 3 retries before giving up
- [ ] Idempotency: Re-running backfill for same range produces identical database state
- [ ] Audit trail: Every backfill logged with:
instrument_id,start_time,end_time,candles_fetched,errors,duration_ms
Monitoring & Observability โ
- [ ] Success rate metric: Percentage of backfill operations completing without errors (target: >95%)
- [ ] Gap coverage metric: Percentage of detected gaps filled within 1 hour (target: >90%)
- [ ] Alert on failure: Trigger alert if same instrument fails backfill >3 times in 1 hour
- [ ] Dashboard widget: Display recent backfill activity (last 24h), gaps remaining, API quota usage
Implementation Scope โ
This ADR defines policy and API contract. Implementation details (TypeScript signatures, test cases, SQL queries) are documented in:
- Specification:
packages/market-data/SPEC.mdยง Backfill - Code:
packages/market-data/src/ingestion/backfill.ts - Tests:
packages/market-data/src/ingestion/backfill.test.ts - Monitoring:
packages/market-data/schema/11_ingestion_monitoring.sql(already includesingestion_gap_metricsview)
References โ
Internal โ
- ADR-002: Market Data Storage (compression, retention policies)
- ADR-003: Market Data Ingestion (precedence, merge semantics, mutable window)
packages/market-data/SPEC.mdยง Backfill (implementation details)packages/market-data/schema/11_ingestion_monitoring.sql(gap detection views)
External โ
- CCXT fetchOHLCV - API reference for historical data fetching
- TimescaleDB Compression - Handling updates to compressed chunks