Skip to content

Explanation: Leave-One-Out Cross-Validation (LOO-CV)

This page explains how AMMM uses PSIS-LOO diagnostics to assess predictive adequacy and reliability of model comparison quantities.

LOO-CV answers a predictive question: how well would this model predict unseen observations from the same data-generating regime?

In AMMM, LOO outputs are part of post-fit diagnostics and should be interpreted alongside convergence and calibration diagnostics, not in isolation.

  • ELPD (elpd_loo): expected log predictive density; higher is better for predictive performance.
  • Complexity (p_loo): effective complexity implied by the fit.
  • Standard error: uncertainty around ELPD estimates and model differences.

LOO-related artefacts are written to 50_diagnostics/:

ArtefactMeaning
50_diagnostics/ELPD.txtHuman-readable LOO summary text.
50_diagnostics/ELPD_summary.csvTabular summary used by reporting and downstream readers.
50_diagnostics/pareto_k.pngObservation-level reliability of PSIS approximation.
50_diagnostics/pareto_k_summary.jsonMachine-readable Pareto k aggregate (k_max, counts, elpd_loo, p_loo, ok).
50_diagnostics/pareto_k_flagged.csvFlagged observations with $k > 0.5$ (when present).

AMMM follows standard PSIS guidance:

RangeInterpretation
$k \le 0.5$Reliable importance-sampling approximation.
$(0.5, 0.7]$Marginal; inspect observations and model structure.
$> 0.7$High influence / unreliable LOO approximation for those points.

Gate integration:

  • pareto_k_summary.json includes ok for machine-readable consumption.
  • If many observations are high-k, model comparison and predictive claims become fragile.

Calibration diagnostics prefer LOO-PIT when log_likelihood is available. This avoids evaluating calibration on the same posterior used to fit each observation directly.

If unavailable, AMMM falls back to PPC-PIT and records that fallback in calibration outputs.

See Calibration Diagnostics for PIT interpretation.

Use LOO outputs as part of a bundle:

  1. Confirm convergence first (convergence_report.json).
  2. Check calibration (calibration_report.json).
  3. Then use ELPD and Pareto diagnostics for model adequacy and comparison.

This ordering reduces the risk of over-interpreting unstable or poorly calibrated fits.

A strong ELPD result supports predictive adequacy, not causal identification. LOO-CV does not test unconfoundedness or rule out omitted variable bias.