Research · Calibration
Model performance
& validation.
All metrics on this page are read directly from the daily pipeline output. Spatial backtests use county-level GroupKFold holdout; the persistence baseline gives a lower bound on skill. The current public forecast covers 148 of 328 California beaches (45%).
Production model
xgb-undersample-ensemble-curated-v0
Public release eligible
Discrimination & calibration scores.
Production AUCPR
0.3894
holdout test set
Production Brier
0.0733
lower is better
Temporal val AUCPR
0.7386
time-blocked CV
Temporal val Brier
0.0925
time-blocked CV
AUCPR baseline— the persistence model (yesterday's result as today's prediction) scores about 0.172 at the county holdout level and 0.241 at the beach holdout level in the current pipeline snapshot. The production model must clear that margin before promotion.
Performance on unseen geography.
County GroupKFold holdout: each fold withholds all beaches from one county. Tests generalisation to locations not seen during training.
ModelAUCPRBrierFoldsRowsBase rate
spatial beach persistence0.57870.248850626637.3%
spatial county persistence0.37040.1846122006517.5%
spatial beach xgb undersample ensemble0.87220.113150626637.3%
spatial county xgb undersample ensemble0.50300.1150122006517.5%
spatial beach hist gbm0.85320.121050626637.3%
spatial county hist gbm0.47380.1179122006517.5%
spatial beach hist gbm positive persistence guard0.62380.226050626637.3%
spatial county hist gbm positive persistence guard0.43820.1762122006517.5%
spatial beach hist gbm persistence blend0.83460.147050626637.3%
spatial county hist gbm persistence blend0.48920.1172122006517.5%
spatial beach hist gbm no bacteria weather delta0.78350.170550626637.3%
spatial county hist gbm no bacteria weather delta0.20060.1500122006517.5%
Models evaluated each CI run.
All candidates must clear the block-bootstrap AUCPR gate to be eligible for production. The temporal validation winner becomes production if it also clears the spatial backtest promotion policy.
logistic-curated-v0logistic-coastal-cells-curated-v0logistic-hierarchical-curated-v0hist-gbm-curated-v0hist-gbm-positive-persistence-guard-curated-v0xgb-undersample-ensemble-curated-v0 ★stacked-ensemble-curated-v0
Full evaluation protocol
Technical Methodology
Spatial CV protocol, promotion gates, and feature rationale.