Explanation: Leave-One-Out Cross-Validation (LOO-CV)
This page explains how AMMM uses PSIS-LOO diagnostics to assess predictive adequacy and reliability of model comparison quantities.
Why LOO-CV Matters in AMMM
Section titled “Why LOO-CV Matters in AMMM”LOO-CV answers a predictive question: how well would this model predict unseen observations from the same data-generating regime?
In AMMM, LOO outputs are part of post-fit diagnostics and should be interpreted alongside convergence and calibration diagnostics, not in isolation.
Core LOO Quantities
Section titled “Core LOO Quantities”- ELPD (
elpd_loo): expected log predictive density; higher is better for predictive performance. - Complexity (
p_loo): effective complexity implied by the fit. - Standard error: uncertainty around ELPD estimates and model differences.
AMMM V2 Artefacts
Section titled “AMMM V2 Artefacts”LOO-related artefacts are written to 50_diagnostics/:
| Artefact | Meaning |
|---|---|
50_diagnostics/ELPD.txt | Human-readable LOO summary text. |
50_diagnostics/ELPD_summary.csv | Tabular summary used by reporting and downstream readers. |
50_diagnostics/pareto_k.png | Observation-level reliability of PSIS approximation. |
50_diagnostics/pareto_k_summary.json | Machine-readable Pareto k aggregate (k_max, counts, elpd_loo, p_loo, ok). |
50_diagnostics/pareto_k_flagged.csv | Flagged observations with $k > 0.5$ (when present). |
Pareto k Interpretation (Gate g6)
Section titled “Pareto k Interpretation (Gate g6)”AMMM follows standard PSIS guidance:
| Range | Interpretation |
|---|---|
| $k \le 0.5$ | Reliable importance-sampling approximation. |
| $(0.5, 0.7]$ | Marginal; inspect observations and model structure. |
| $> 0.7$ | High influence / unreliable LOO approximation for those points. |
Gate integration:
pareto_k_summary.jsonincludesokfor machine-readable consumption.- If many observations are high-k, model comparison and predictive claims become fragile.
LOO-PIT Connection
Section titled “LOO-PIT Connection”Calibration diagnostics prefer LOO-PIT when log_likelihood is available. This avoids evaluating calibration on the same posterior used to fit each observation directly.
If unavailable, AMMM falls back to PPC-PIT and records that fallback in calibration outputs.
See Calibration Diagnostics for PIT interpretation.
Practical Use in Stage-Gated Workflow
Section titled “Practical Use in Stage-Gated Workflow”Use LOO outputs as part of a bundle:
- Confirm convergence first (
convergence_report.json). - Check calibration (
calibration_report.json). - Then use ELPD and Pareto diagnostics for model adequacy and comparison.
This ordering reduces the risk of over-interpreting unstable or poorly calibrated fits.
Causal Caveat
Section titled “Causal Caveat”A strong ELPD result supports predictive adequacy, not causal identification. LOO-CV does not test unconfoundedness or rule out omitted variable bias.