NHL API Strategy

Understanding our approach to NHL API integration.

The NHL API

NHL provides a public JSON API at api-web.nhle.com with endpoints for:

  • Current standings

  • Team rosters

  • Player statistics

  • Game schedules

  • And more

Why This API?

Alternatives Considered

Official NHL Stats API (stats.nhl.com):

  • More comprehensive data

  • Better documented

  • ❌ More complex

  • ❌ Requires API key (authentication)

Third-party APIs (ESPN, The Sports DB):

  • Easier to use

  • ❌ Less reliable

  • ❌ May have usage limits

  • ❌ Not official source

Web Scraping:

  • Could get any data

  • ❌ Fragile (breaks when HTML changes)

  • ❌ Violates terms of service

  • ❌ Unethical

Chose api-web.nhle.com:

  • ✅ Official NHL source

  • ✅ No authentication required

  • ✅ Simple JSON responses

  • ✅ Reliable uptime

  • ✅ Current roster data

Integration Approach

1. Retry Logic

Network requests can fail. We retry automatically:

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=retry_if_exception_type((RequestException, Timeout)),
)
def _fetch_with_retry(self, url: str) -> dict:
    response = self.session.get(url, timeout=self.timeout)
    response.raise_for_status()
    return response.json()

Why?:

  • Temporary network blips

  • Server momentary overload

  • Connection resets

Strategy: Exponential backoff (wait longer each retry)

2. Rate Limiting

We delay between requests to be polite:

def fetch_roster(self, team_abbrev: str) -> list[Player]:
    time.sleep(self.rate_limit_delay)  # Default 0.3s
    return self._fetch_with_retry(url)

Why?:

  • Respect API provider

  • Avoid being blocked

  • Distribute load over time

Trade-off: Slower (but polite and reliable)

3. Caching

We cache API responses in memory:

@lru_cache(maxsize=128)
def fetch_standings(self) -> list[Team]:
    return self._fetch_standings_uncached()

Why?:

  • Reduce redundant requests

  • Faster subsequent runs

  • Less load on NHL servers

Trade-off: May show slightly stale data

4. Error Handling

We handle errors gracefully:

try:
    roster = api_client.fetch_roster(team_abbrev)
except NHLApiError as e:
    logger.error(f"Failed to fetch {team_abbrev}: {e}")
    failed_teams.append(team_abbrev)
    continue  # Process other teams

Why?:

  • One team failure shouldn’t break entire analysis

  • User gets partial results

  • Failures are logged for debugging

Data Validation

We validate API responses using Pydantic:

class Player(BaseModel):
    firstName: str = Field(..., min_length=1)
    lastName: str = Field(..., min_length=1)
    sweaterNumber: int | None = None
    positionCode: str

Why?:

  • Catch API changes early

  • Type-safe data

  • Self-documenting code

  • Clear error messages

Trade-offs

Synchronous vs Async

Chose synchronous:

  • ✅ Simpler code

  • ✅ Easier to understand

  • ✅ Sufficient performance

  • ❌ Could be faster with async

Could optimize: Fetch all rosters in parallel with asyncio (~3-5s instead of ~10-15s)

Caching Strategy

Chose in-memory LRU cache:

  • ✅ Simple implementation

  • ✅ Works for CLI use

  • ❌ Cache lost between runs

  • ❌ Not shared across instances

Could optimize: Redis cache for persistent, shared caching

Error Handling Philosophy

Chose graceful degradation:

  • ✅ Partial results better than none

  • ✅ User can see what succeeded

  • ❌ May be unexpected for users

Alternative: Fail fast on any error (more predictable but less useful)

Future Improvements

1. Async/Parallel Fetching

async def fetch_all_rosters(self, teams):
    tasks = [self.fetch_roster(t) for t in teams]
    return await asyncio.gather(*tasks)

Could reduce runtime from ~15s to ~3-5s.

2. Persistent Caching

import redis

cache = redis.Redis()


@cache_with_redis(expiry=3600)
def fetch_standings(): ...

Share cache across runs and users.

3. Webhook/Streaming

Subscribe to roster changes instead of polling:

@on_roster_change
def handle_roster_update(team, changes):
    # Update scores automatically
    ...

Real-time updates without constant polling.

4. GraphQL Alternative

If NHL provides GraphQL:

query {
  teams {
    abbreviation
    roster {
      firstName
      lastName
    }
  }
}

More efficient - fetch exactly what we need.