Explanation: Leave-One-Out Cross-Validation (LOO-CV)

This page explains how AMMM uses PSIS-LOO diagnostics to assess predictive adequacy and reliability of model comparison quantities.

Why LOO-CV Matters in AMMM

LOO-CV answers a predictive question: how well would this model predict unseen observations from the same data-generating regime?

In AMMM, LOO outputs are part of post-fit diagnostics and should be interpreted alongside convergence and calibration diagnostics, not in isolation.

Core LOO Quantities

ELPD (elpd_loo): expected log predictive density; higher is better for predictive performance.
Complexity (p_loo): effective complexity implied by the fit.
Standard error: uncertainty around ELPD estimates and model differences.

AMMM V2 Artefacts

LOO-related artefacts are written to 50_diagnostics/:

Artefact	Meaning
`50_diagnostics/ELPD.txt`	Human-readable LOO summary text.
`50_diagnostics/ELPD_summary.csv`	Tabular summary used by reporting and downstream readers.
`50_diagnostics/pareto_k.png`	Observation-level reliability of PSIS approximation.
`50_diagnostics/pareto_k_summary.json`	Machine-readable Pareto k aggregate (`k_max`, counts, `elpd_loo`, `p_loo`, `ok`).
`50_diagnostics/pareto_k_flagged.csv`	Flagged observations with $k > 0.5$ (when present).

Pareto k Interpretation (Gate g6)

AMMM follows standard PSIS guidance:

Range	Interpretation
$k \le 0.5$	Reliable importance-sampling approximation.
$(0.5, 0.7]$	Marginal; inspect observations and model structure.
$> 0.7$	High influence / unreliable LOO approximation for those points.

Gate integration:

pareto_k_summary.json includes ok for machine-readable consumption.
If many observations are high-k, model comparison and predictive claims become fragile.

LOO-PIT Connection

Calibration diagnostics prefer LOO-PIT when log_likelihood is available. This avoids evaluating calibration on the same posterior used to fit each observation directly.

If unavailable, AMMM falls back to PPC-PIT and records that fallback in calibration outputs.

See Calibration Diagnostics for PIT interpretation.

Practical Use in Stage-Gated Workflow

Use LOO outputs as part of a bundle:

Confirm convergence first (convergence_report.json).
Check calibration (calibration_report.json).
Then use ELPD and Pareto diagnostics for model adequacy and comparison.

This ordering reduces the risk of over-interpreting unstable or poorly calibrated fits.

Causal Caveat

A strong ELPD result supports predictive adequacy, not causal identification. LOO-CV does not test unconfoundedness or rule out omitted variable bias.