Explanation: Calibration Diagnostics

Calibration asks whether predictive uncertainty is statistically consistent with realised outcomes.

A model can fit averages well and still be badly calibrated. AMMM therefore runs dedicated calibration diagnostics in 50_diagnostics/.

PIT Intuition

The Probability Integral Transform (PIT) maps each observed value through its predictive distribution.

If predictive distributions are calibrated, PIT values are approximately Uniform(0,1):

flat histogram: good calibration,
U-shaped histogram: over-confident (intervals too narrow),
inverted-U histogram: under-confident (intervals too wide),
systematic shift left/right: biased location.

LOO-PIT vs PPC-PIT

AMMM prefers LOO-PIT when log_likelihood is available on InferenceData. This avoids double-use of each observation in both fitting and calibration scoring.

If log_likelihood is unavailable, AMMM falls back to PPC-PIT and records this in pit_method.

Delta-ECDF and Simultaneous Bands

AMMM also uses a delta-ECDF view:

compare empirical PIT CDF against the uniform CDF,
inspect departures relative to simultaneous confidence bands.

Persistent departures outside bands indicate material calibration misspecification.

Coverage Calibration

Coverage is checked across nominal levels (0.1 to 0.9):

nominal level = intended interval mass,
empirical coverage = realised proportion inside those intervals.

Ideal behaviour lies near the 45° line.

Diagnosis Categories

AMMM calibration reporting includes:

well-calibrated
over-confident
under-confident
biased

The top-level machine-readable field is well_calibrated in calibration_report.json.

Artefacts

Artefact	Purpose
`50_diagnostics/calibration_report.json`	Machine-readable diagnosis, `well_calibrated`, `pit_method`, and summary metrics.
`50_diagnostics/calibration_coverage.csv`	Nominal vs empirical coverage table.
`50_diagnostics/calibration_pit_histogram.png`	PIT histogram with uniform reference.
`50_diagnostics/calibration_pit_ecdf.png`	Delta-ECDF with simultaneous confidence bands.
`50_diagnostics/calibration_coverage.png`	Coverage curve versus ideal diagonal.

Gate Role in Workflow

Calibration corresponds to gate g5 in the stage-gated workflow. Poor calibration does not automatically invalidate code execution, but it should reduce trust in downstream business interpretation unless resolved.

See Workflow Stages and Methodology.

Causal Caveat

Good calibration supports probabilistic adequacy, not causal correctness.