Skip to content

Guide: Cross-Validation and Predictive Evaluation

This guide explains how to evaluate AMMM V2 predictive performance using LOO diagnostics and hold-out checks.

  • Running and interpreting PSIS-LOO
  • Comparing alternative model specifications
  • Hold-out validation workflow
  • Stage artefacts to inspect

AMMM computes LOO-related artefacts during workflow execution when log_likelihood is available on InferenceData.

Check:

  • 50_diagnostics/ELPD.txt
  • 50_diagnostics/ELPD_summary.csv
  • 50_diagnostics/pareto_k_summary.json
  • 50_diagnostics/pareto_k.png

Interpretation:

  • Higher elpd_loo is better (for models fit on the same dataset).
  • pareto_k > 0.7 indicates potentially unreliable local approximations.

2. Manual LOO Calculation (Notebook/Script)

Section titled “2. Manual LOO Calculation (Notebook/Script)”
import arviz as az
from pathlib import Path
from src.core.mmm_model_v2 import DelayedSaturatedMMMv2
model_path = Path("results/20_model_fit/model.nc")
if not model_path.exists():
raise FileNotFoundError("Expected results/20_model_fit/model.nc")
model = DelayedSaturatedMMMv2.load(str(model_path))
idata = getattr(model, "idata", None)
if idata is None:
raise RuntimeError("Loaded model has no idata.")
loo_results = az.loo(idata)
print(loo_results)
import arviz as az
from pathlib import Path
from src.core.mmm_model_v2 import DelayedSaturatedMMMv2
model_a = DelayedSaturatedMMMv2.load("results_model_a/20_model_fit/model.nc")
model_b = DelayedSaturatedMMMv2.load("results_model_b/20_model_fit/model.nc")
cmp = az.compare({"model_a": model_a.idata, "model_b": model_b.idata})
print(cmp)
az.plot_compare(cmp)

Use this only when both models were fit on the same target and time window.

Set train_test_ratio < 1.0 in config, then run the pipeline.

Inspect:

  • 30_model_assessment/model_fit_predictions.png
  • 30_model_assessment/model_fit_metrics.csv

These summarise in-sample and (when available) out-of-sample fit behaviour.

  1. 50_diagnostics/convergence_report.json -> converged
  2. 50_diagnostics/calibration_report.json -> well_calibrated
  3. 50_diagnostics/pareto_k_summary.json -> ok

If these checks fail, treat decomposition/optimisation outputs as provisional.

Note: good predictive diagnostics do not imply causal validity.