Guide: Cross-Validation and Predictive Evaluation
This guide explains how to evaluate AMMM V2 predictive performance using LOO diagnostics and hold-out checks.
What This Guide Covers
Section titled “What This Guide Covers”- Running and interpreting PSIS-LOO
- Comparing alternative model specifications
- Hold-out validation workflow
- Stage artefacts to inspect
1. LOO in AMMM V2
Section titled “1. LOO in AMMM V2”AMMM computes LOO-related artefacts during workflow execution when
log_likelihood is available on InferenceData.
Check:
50_diagnostics/ELPD.txt50_diagnostics/ELPD_summary.csv50_diagnostics/pareto_k_summary.json50_diagnostics/pareto_k.png
Interpretation:
- Higher
elpd_loois better (for models fit on the same dataset). pareto_k > 0.7indicates potentially unreliable local approximations.
2. Manual LOO Calculation (Notebook/Script)
Section titled “2. Manual LOO Calculation (Notebook/Script)”import arviz as azfrom pathlib import Path
from src.core.mmm_model_v2 import DelayedSaturatedMMMv2
model_path = Path("results/20_model_fit/model.nc")if not model_path.exists(): raise FileNotFoundError("Expected results/20_model_fit/model.nc")
model = DelayedSaturatedMMMv2.load(str(model_path))idata = getattr(model, "idata", None)if idata is None: raise RuntimeError("Loaded model has no idata.")
loo_results = az.loo(idata)print(loo_results)3. Comparing Two Models
Section titled “3. Comparing Two Models”import arviz as azfrom pathlib import Path
from src.core.mmm_model_v2 import DelayedSaturatedMMMv2
model_a = DelayedSaturatedMMMv2.load("results_model_a/20_model_fit/model.nc")model_b = DelayedSaturatedMMMv2.load("results_model_b/20_model_fit/model.nc")
cmp = az.compare({"model_a": model_a.idata, "model_b": model_b.idata})print(cmp)az.plot_compare(cmp)Use this only when both models were fit on the same target and time window.
4. Hold-Out Validation
Section titled “4. Hold-Out Validation”Set train_test_ratio < 1.0 in config, then run the pipeline.
Inspect:
30_model_assessment/model_fit_predictions.png30_model_assessment/model_fit_metrics.csv
These summarise in-sample and (when available) out-of-sample fit behaviour.
5. Validation Checks Before Business Use
Section titled “5. Validation Checks Before Business Use”50_diagnostics/convergence_report.json->converged50_diagnostics/calibration_report.json->well_calibrated50_diagnostics/pareto_k_summary.json->ok
If these checks fail, treat decomposition/optimisation outputs as provisional.
Note: good predictive diagnostics do not imply causal validity.