Data Sources

Overview

Trackser Live aggregates and processes data from two distinct TfL (Transport for London) APIs to provide comprehensive real-time train tracking. Each data source has unique characteristics, strengths, and use cases.

Trackser Live aggregates data from two TfL APIs - TrackerNet (legacy, detailed Underground data) and the Unified API (modern, cross-network coverage). Each source has unique strengths, and we intelligently merge them to provide comprehensive coverage.

ℹ️ Note: Both data sources are publicly available through TfL's Open Data platform.

For more information on TfL's Open Data platform, visit: https://tfl.gov.uk/info -for/open-data-users/

Thank you to Transport for London for providing access to these valuable data sources that power Trackser Live.

Below is a detailed breakdown of each data source, their characteristics, strengths, limitations, and how Trackser Live utilizes them.

🚇 TrackerNet API

Primary Data Source

TrackerNet is TfL's legacy real-time tracking system that has been in operation for many years. It provides detailed train position data through XML feeds for each line.

Key Characteristics:

Coverage: Underground lines only (not Elizabeth line or Overground)
Update Frequency: Updates on each request
Data Format: XML-based station-specific feeds
Location Detail: Provides track codes and platform-level granularity
Historical Reliability: Proven, stable system with consistent data structure

Strengths:

Rich metadata including track codes for precise location inference
Detailed platform and station-level information
Leading Car Number information
Consistent train identification across updates
Lower latency for Underground trains

Limitations:

Requires significant processing to normalize inconsistent station names
Contains duplicate and stale records that need filtering
Legacy XML format requires more parsing overhead

🔷 Unified API

Modern Alternative Source

The Unified API is TfL's newer, modernized API that provides a consistent interface across all TfL services including Underground, Elizabeth line, DLR, and Overground networks.

Key Characteristics:

Coverage: All TfL services (Underground, Elizabeth line, DLR, Overground)
Update Frequency: Variable, typically 30-60 seconds
Data Format: JSON-based RESTful API
Location Detail: Station-level, less granular than TrackerNet
Modern Architecture: Standardized across all TfL services

Strengths:

Broader network coverage including newer lines
More consistent data structure across all services
Modern JSON format easier to parse and process
Better standardization of station names

Limitations:

Less detailed location information (no track codes)
An extra layer of data processing which can add latency
Different train identification scheme can make tracking harder
Occasionally missing data that TrackerNet provides
Has a caching layer which can delay real-time updates

Data Source Comparison

Feature	TrackerNet	Unified API
Network Coverage	Underground only	All TfL networks
Data Format	XML	JSON
Update Frequency	~30 seconds	~30-60 seconds
Location Granularity	Track code level	Station level
Platform Information	Detailed	Basic
Train Identification	Vehicle ID + Set Number	Set Number
Data Consistency	Requires heavy normalization	More standardized
API Age	Legacy system	Modern system

How Trackser Live Uses Both Sources

Intelligent Merging Strategy

Trackser Live intelligently combines data from both sources to provide the most complete and accurate picture of train movements. The merging process follows these principles:

Primary Source Selection: TrackerNet is typically preferred for Underground lines due to its detail and reliability
Fallback Mechanism: If TrackerNet data is unavailable or stale, the system falls back to Unified API
Data Enrichment: Information from both sources is combined when available to fill gaps
Deduplication: Trains appearing in both sources are identified and merged based on location and vehicle IDs
Source Tracking: Each train record includes a source field indicating its origin

Configuration

The Unified API can be enabled or disabled per line through configuration. By default, it's disabled for most Underground lines where TrackerNet provides superior data, but it can be enabled for:

Lines where TrackerNet has known issues
Extended network coverage (Elizabeth line, Overground)
Backup/redundancy during TrackerNet outages
Comparative data analysis

💡 Pro Tip: You can see which data sources are active in the API response under the sources field, and track merge statistics in the stats.merged section to understand how data from both sources was combined.

Processing Pipeline

Regardless of source, all data goes through Trackser Live's comprehensive processing pipeline:

Data Acquisition: Fetch data from TrackerNet XML feeds and/or Unified API JSON endpoints, twice per minute
Parsing & Validation: Parse raw data and validate required fields
Location Normalization: Standardize station names and infer locations from track codes
Deduplication: Remove duplicate records within and across sources
Enrichment: Add destination codes, stopping patterns, and historical context
Merging: Intelligently combine trains from multiple sources
Stall Detection: Analyze location duration against historical averages
Output Generation: Format and compress final JSON response

Statistics & Monitoring

The API provides detailed statistics about data source performance in every response:

stats.trackernet - TrackerNet processing metrics (pre/post filtering, dropped IDs)
stats.unified - Unified API processing metrics (pre/post filtering, synthetic trains)
stats.merged - Merge operation results and train counts by source
sources - Current status of each data source ("ok", "disabled", "error")

📊 View Example API Output with Source Statistics →