Skip to content

CrawlingManager

Fetches symbol data from CoinMarketCap, CoinGecko, and Hyperliquid APIs, then syncs discoveries and rankings into the registry.

Purpose

CrawlingManager discovers new tradeable symbols and tracks market cap rankings from external data sources. It is stateless — all persistence is delegated through the parent DO to RankingsManager.

New symbols are never auto-enabled. The crawler sets PROMOTE/DEMOTE recommendations that an admin must act on. This separation ensures human review before any symbol goes live for trading.

High-Level Design

Three-Source Aggregation

CrawlingManager fetches from three providers in parallel, each serving a different role:

┌──────────────────────────────────────────────────────────────────┐
│                    crawlAndSync(topN)                            │
│                                                                  │
│   ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐       │
│   │ CoinMarketCap│  │  CoinGecko  │  │   Hyperliquid    │       │
│   │  (API key)   │  │ (HTML scrape)│  │  (REST POST)     │       │
│   │              │  │              │  │                   │       │
│   │ Top N by     │  │ Top N by     │  │ ALL perpetual    │       │
│   │ market cap   │  │ market cap   │  │ contracts        │       │
│   │ + cmc_rank   │  │ + page order │  │ + szDecimals     │       │
│   └──────┬───────┘  └──────┬───────┘  │ + maxLeverage    │       │
│          │                 │          └────────┬─────────┘       │
│          ▼                 ▼                   │                  │
│   ┌────────────────────────────┐               │                  │
│   │  Common Set (intersection) │               │                  │
│   │  = CMC ∩ CoinGecko top N   │               │                  │
│   └────────────┬───────────────┘               │                  │
│                │                               │                  │
│                ▼                               ▼                  │
│   ┌────────────────────────────────────────────────────────┐     │
│   │              Per-Symbol Decision Loop                   │     │
│   │                                                        │     │
│   │  For each Hyperliquid symbol:                          │     │
│   │    NEW → insert as DISABLED + auto-categorize          │     │
│   │           + PROMOTE if in common set                   │     │
│   │    EXISTING + disabled + in common set → PROMOTE       │     │
│   │    EXISTING + enabled + NOT in common set → DEMOTE     │     │
│   │    Otherwise → NONE                                    │     │
│   └────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────┘
  • Hyperliquid is the source of truth for what symbols are tradeable. Only symbols that exist as Hyperliquid perpetuals can be inserted.
  • CMC + CoinGecko serve as a cross-validation layer for ranking. A symbol must appear in both top-N lists to be considered for promotion, reducing false positives from a single source.

Dual-Source Cross-Validation

The "common set" intersection is the core design decision. If either ranking source fails or returns empty, the intersection is empty, which means:

  • No PROMOTE recommendations are set (conservative — avoids promoting based on a single source)
  • No DEMOTE recommendations are set (no false demotions from partial data)
  • Symbol discovery from Hyperliquid still proceeds normally

This makes the system resistant to single-source anomalies for promotion. However, note that ranking failures do not suppress DEMOTE: if both sources return empty, commonTickers is empty, so every enabled symbol gets a DEMOTE recommendation. Discovery from Hyperliquid still proceeds regardless.

Auto-Categorization

When a new symbol is discovered, it's assigned a category based on hardcoded rules (checked in priority order):

  1. MEME — hardcoded list: DOGE, SHIB, PEPE, FLOKI, BONK, WIF, MEME
  2. MAJOR — BTC or ETH
  3. LARGE_CAP — CMC rank ≤ 10
  4. ALTCOIN — CMC rank ≤ 50
  5. EMERGING — everything else (unranked or rank > 50)

Categories are set at insertion time and don't auto-update on subsequent crawls. Admin can override via SymbolAdminManager.

CoinGecko HTML Scraping

CoinGecko is scraped from the homepage HTML (no API key required). The parser looks for the gecko-homepage-coin-table class, then extracts ticker symbols via regex from <td>/<div> patterns. This is inherently fragile — if CoinGecko changes their HTML structure, the parser returns [] and the common set becomes empty (safe degradation). Extracted symbols are filtered to remove fiat currencies (USD, EUR, GBP) and implausible tickers (< 2 or > 10 chars).

Hyperliquid Symbol Metadata

Each Hyperliquid symbol provides szDecimals (size decimals). Price decimals are derived as 5 - szDecimals, and priceScale = 10^decimals. Symbol UIDs are constructed as ${ticker}USD (e.g., "BTC" → "BTCUSD"), matching the internal SymbolIdentifiers enum format.

Edge Cases & Error Handling

  • Missing CMC_API_KEY: logs warning, returns empty array. Common set becomes empty, so no recommendations are set.
  • Any single API failure returns [] for that source; other sources still proceed.
  • INSERT OR IGNORE prevents duplicate symbol insertion on repeated crawls.

See Also