Explanation: Calibration Diagnostics
Calibration asks whether predictive uncertainty is statistically consistent with realised outcomes.
A model can fit averages well and still be badly calibrated. AMMM therefore runs dedicated calibration diagnostics in 50_diagnostics/.
PIT Intuition
Section titled “PIT Intuition”The Probability Integral Transform (PIT) maps each observed value through its predictive distribution.
If predictive distributions are calibrated, PIT values are approximately Uniform(0,1):
- flat histogram: good calibration,
- U-shaped histogram: over-confident (intervals too narrow),
- inverted-U histogram: under-confident (intervals too wide),
- systematic shift left/right: biased location.
LOO-PIT vs PPC-PIT
Section titled “LOO-PIT vs PPC-PIT”AMMM prefers LOO-PIT when log_likelihood is available on InferenceData. This avoids double-use of each observation in both fitting and calibration scoring.
If log_likelihood is unavailable, AMMM falls back to PPC-PIT and records this in pit_method.
Delta-ECDF and Simultaneous Bands
Section titled “Delta-ECDF and Simultaneous Bands”AMMM also uses a delta-ECDF view:
- compare empirical PIT CDF against the uniform CDF,
- inspect departures relative to simultaneous confidence bands.
Persistent departures outside bands indicate material calibration misspecification.
Coverage Calibration
Section titled “Coverage Calibration”Coverage is checked across nominal levels (0.1 to 0.9):
- nominal level = intended interval mass,
- empirical coverage = realised proportion inside those intervals.
Ideal behaviour lies near the 45° line.
Diagnosis Categories
Section titled “Diagnosis Categories”AMMM calibration reporting includes:
well-calibratedover-confidentunder-confidentbiased
The top-level machine-readable field is well_calibrated in calibration_report.json.
Artefacts
Section titled “Artefacts”| Artefact | Purpose |
|---|---|
50_diagnostics/calibration_report.json | Machine-readable diagnosis, well_calibrated, pit_method, and summary metrics. |
50_diagnostics/calibration_coverage.csv | Nominal vs empirical coverage table. |
50_diagnostics/calibration_pit_histogram.png | PIT histogram with uniform reference. |
50_diagnostics/calibration_pit_ecdf.png | Delta-ECDF with simultaneous confidence bands. |
50_diagnostics/calibration_coverage.png | Coverage curve versus ideal diagonal. |
Gate Role in Workflow
Section titled “Gate Role in Workflow”Calibration corresponds to gate g5 in the stage-gated workflow. Poor calibration does not automatically invalidate code execution, but it should reduce trust in downstream business interpretation unless resolved.
See Workflow Stages and Methodology.
Causal Caveat
Section titled “Causal Caveat”Good calibration supports probabilistic adequacy, not causal correctness.