Calibration Assessment (`src.diagnostics.calibration`)
Overview
Section titled “Overview”run_calibration_assessment evaluates predictive calibration from posterior predictive draws and writes both diagnostic plots and a machine-readable calibration report to 50_diagnostics/. It prefers LOO-PIT when log-likelihood is available.
Function Signature
Section titled “Function Signature”from src.diagnostics.calibration import run_calibration_assessment
run_calibration_assessment( model: Any, X_train: pd.DataFrame, y_train: pd.Series | np.ndarray, config: dict[str, Any], results_dir: str,) -> dict[str, Any]Parameters
Section titled “Parameters”| Parameter | Description |
|---|---|
model | Fitted model with idata; can generate posterior predictive if missing. |
X_train | Training feature matrix (used when generating posterior predictive samples). |
y_train | Training target array/series. |
config | Run configuration dictionary. |
results_dir | Run root directory; outputs are saved in 50_diagnostics/. |
Artefacts Produced
Section titled “Artefacts Produced”| Filename | Stage folder | Description |
|---|---|---|
calibration_report.json | 50_diagnostics/ | Machine-readable summary including well_calibrated, diagnosis, pit_method. |
calibration_coverage.csv | 50_diagnostics/ | Nominal vs empirical coverage table with deviation. |
calibration_pit_histogram.png | 50_diagnostics/ | PIT histogram against ideal uniform density. |
calibration_pit_ecdf.png | 50_diagnostics/ | Delta-ECDF with ~99% confidence bands. |
calibration_coverage.png | 50_diagnostics/ | Empirical vs nominal coverage plot (identity line reference). |
PIT Method Selection
Section titled “PIT Method Selection”The function selects PIT strategy in this order:
loo_pitwhenidata.log_likelihoodexists.ppc_pitwhen log-likelihood is absent.ppc_pit (fallback)ifaz.loo_pit(...)fails at runtime.
This choice is recorded in calibration_report.json as pit_method.
Diagnosis Categories
Section titled “Diagnosis Categories”The returned diagnosis is rule-based from PIT moments and coverage deviation:
| Pattern | Reported diagnosis |
|---|---|
| Near-uniform PIT and small coverage error | well-calibrated |
| Mean PIT shift | biased (systematic location shift) |
| Empirical intervals too narrow | over-confident (predictions too certain) or over-confident (intervals too narrow) |
| Empirical intervals too wide | under-confident (predictions too uncertain) or under-confident (intervals too wide) |
| Missing finite PIT values | insufficient data |
Interpretation shortcut:
- U-shaped PIT tends to indicate over-confidence.
- Hump-shaped PIT tends to indicate under-confidence.
Scale Handling
Section titled “Scale Handling”If the model uses in-graph target scaling, calibration computations attempt inverse-transform through the scaling strategy resolver so plots and coverage are evaluated on the original target scale.
Usage Example
Section titled “Usage Example”from src.diagnostics.calibration import run_calibration_assessment
cal_report = run_calibration_assessment( model=driver.model, X_train=driver.X_train, y_train=driver.y_train, config=driver.config, results_dir=driver.results_dir,)
print(cal_report["well_calibrated"], cal_report["diagnosis"], cal_report.get("pit_method"))Relationship to Workflow Stages and Gates
Section titled “Relationship to Workflow Stages and Gates”- Stage:
50_diagnostics/. - Feeds calibration gate
g5viacalibration_report.json. well_calibratedis the machine-readable calibration outcome consumed downstream for interpretation risk signalling.