How We Score Formula 1 Races

Our scoring system evaluates race worthwhileness on a 1-10 scale by analyzing eleven distinct dimensions of racing quality, including an AI assessment. Each dimension contributes to the final score based on its assigned weight, reflecting how much it typically improves the spectator experience.

The scoring model processes race data from external sources, normalizes each metric to a 0-10 scale, applies dimension weights, and produces a composite score that represents overall race quality. Final scores are capped at 10.0.

Weather (11% of total score)

Rain Factor

What it measures:

Weather conditions during the race, indicating whether it remained dry or featured rain and mixed wet-dry periods.

How it's calculated:

Binary measurement. Races with rainfall score 10.0, dry races score 0.0

Why it matters:

Rain introduces unpredictability by reducing grip, increasing braking distances, and forcing strategic decisions about tyre selection and timing. Wet conditions typically produce position changes and highlight driver skill in car control. The high weight reflects rain's consistent impact on race quality across the sport's history.

Interruptions (17% of total score)

Race Interruptions

What it measures:

Combined total of safety car deployments, virtual safety car periods, and red flag stoppages that interrupted the race.

How it's calculated:

Linear scale from 0 to maximum observed value. Higher values produce higher scores

Why it matters:

Interruptions bunch the field, eliminate time gaps, and create restart scenarios that enable position changes. They signal incidents or mechanical failures that reshape race strategy and create uncertainty about the final outcome.

Dnf Factor

What it measures:

Combined count of drivers who did not finish (DNF), did not start (DNS), or were disqualified (DSQ) from the race.

How it's calculated:

Linear scale from 0 to maximum observed value. Higher values produce higher scores

Why it matters:

Retirements indicate mechanical unreliability, driver errors, or contact incidents. High DNF counts suggest challenging conditions or an error-inducing circuit layout. Multiple retirements can promote lower-placed drivers into points positions and affect championship implications.

Racing Quality (30% of total score)

Ai Assessment

What it measures:

Independent AI rating based on web search of race reports, fan reactions, and expert analysis.

How it's calculated:

Direct 0-10 score from AI web search assessment, averaged across three independent calls

Why it matters:

The AI assessment captures narrative context that pure statistics miss—like a processional front despite high midfield overtakes, or historic significance that elevates a race's worth-watching value. It provides a qualitative check on the quantitative data dimensions.

Overtakes Top10

What it measures:

Number of on-track position changes between drivers who finished in the top ten classified positions.

How it's calculated:

Linear scale based on overtake count

Why it matters:

Overtakes within the top ten directly affect race outcome and podium positions. These passes typically involve championship contenders and indicate competitive racing where position isn't determined solely by qualifying performance or car advantage.

Overtakes Total

What it measures:

Total number of on-track position changes recorded throughout the entire race across all positions.

How it's calculated:

Linear scale based on overtake count

Why it matters:

High overtake numbers indicate racing where position changes are achievable, suggesting good circuit design for wheel-to-wheel combat or performance parity between cars. Low overtake counts often correlate with processional races where grid order determines finishing order.

Strategy (12% of total score)

Unique Tyre Compounds

What it measures:

Number of different tyre compound types (dry, intermediate, wet) used by five or more drivers during the race.

How it's calculated:

Linear scale based on count of unique strategies or compounds

Why it matters:

Multiple compound types in play suggest evolving conditions or wide strategic choices. More compounds create performance differentials as cars on different tyres have different grip levels and degradation rates.

Tyre Strategy Variety

What it measures:

Number of distinct tyre compound sequences used by the top 6 finishers, where different pit stop orders count as unique strategies.

How it's calculated:

Linear scale based on count of unique strategies or compounds

Why it matters:

Strategy diversity indicates teams took different approaches to the race, creating varied pit stop timing and different performance windows throughout the race distance. This produces natural pace differentials and battles between cars on different strategic paths.

Competition & Finishing Order (30% of total score)

Grid Chaos

What it measures:

Weighted score measuring dramatic position changes from qualifying grid to race finish, with higher scores for top-five starters who drop back or backmarkers who finish on the podium.

How it's calculated:

Linear scale based on observed value relative to maximum

Why it matters:

High volatility indicates the race shuffled the competitive order, meaning qualifying didn't determine the outcome. This suggests overtaking was possible, strategy played a role, or race incidents affected the natural pecking order.

Top3 Gap

What it measures:

Time gap in seconds between the race winner and the second-place finisher at the checkered flag.

How it's calculated:

Inverse scale where smaller gaps score higher. 0 seconds = 10.0, gaps above 30 seconds = 0.0

Why it matters:

Close finishes indicate genuine competition for the win rather than dominant performance by a single car or driver. Small gaps suggest the leader was under pressure throughout the race distance.

Team Variety

What it measures:

Number of different constructor teams represented among the top five finishing positions.

How it's calculated:

Linear scale based on observed value relative to maximum

Why it matters:

Greater team variety in top positions indicates competitive parity and reduces dominance by a single team. Five different teams in the top five suggests open competition; one or two teams suggests performance concentration.

Normalization Rules

Boolean normalization

Applied to binary conditions like rainfall. True conditions score 10.0, false conditions score 0.0.

Linear normalization

Applied to counting metrics like overtakes or interruptions. Scales the observed value against the maximum value in the dataset: (value ÷ maximum) × 10, capped at 10.0.

Inverse gap normalization

Applied to time gaps where smaller is better. Maps gaps from 0 seconds (perfect 10.0) to 30+ seconds (0.0) using the formula: 10 - (gap ÷ 30) × 10.

Direct normalization

Applied to values already on a 0-10 scale. The value is used as-is with no transformation. Used for the AI Assessment dimension.

Final Score Calculation

Each dimension's normalized score (0-10) is multiplied by its weight, and all weighted contributions are summed to produce the final race score. Dimension weights sum to 100%, so the maximum possible score is 10.0.

The eleven dimensions span multiple categories—weather, interruptions, racing quality, strategy, and competition—allowing races to earn high scores through different characteristics. A chaotic rain-affected race, a strategic multi-stop battle, or pure wheel-to-wheel racing can all achieve excellent scores through their own merits.

AI Assessment

AI Assessment is a weighted scoring dimension (12% weight), just like the other ten dimensions. An LLM with web search researches each race independently—reading race reports, fan reactions, and expert analysis. It's prompted to identify both the boring and exciting aspects of the race before scoring, and calibrated against reference races (e.g., 2019 France GP = 1.5, 2019 German GP = 10). Three separate calls are made and averaged to reduce variance.

Raw AI scores are then normalized to the full 0-10 range based on the lowest and highest AI scores across all races. This compensates for LLMs' tendency to cluster scores in a narrow band, ensuring the AI dimension provides meaningful differentiation between races.

The AI assessment captures narrative context that pure statistics miss—like a processional front despite high midfield overtake counts, or historic significance that elevates a race's worth-watching value. Races without an AI score simply receive zero contribution from this dimension.