Pareto k Diagnostic (`src.diagnostics.pareto_k`)

Overview

run_pareto_k_diagnostic assesses the reliability of PSIS-LOO approximations and flags influential observations. Outputs are written to 50_diagnostics/.

Function Signature

from src.diagnostics.pareto_k import run_pareto_k_diagnostic

run_pareto_k_diagnostic(
    model: Any,
    config: dict[str, Any],
    results_dir: str,
) -> dict[str, Any]

Parameters

Parameter	Description
`model`	Fitted model exposing `idata`/`fit_result`; requires `log_likelihood` for computation.
`config`	Run configuration dictionary.
`results_dir`	Run root directory; outputs are saved in `50_diagnostics/`.

Artefacts Produced

Filename	Stage folder	Description
`pareto_k_summary.json`	`50_diagnostics/`	Aggregate k diagnostics (`k_max`, counts, `elpd_loo`, `p_loo`, `ok`).
`pareto_k.png`	`50_diagnostics/`	ArviZ `plot_khat` visualisation.
`pareto_k_flagged.csv`	`50_diagnostics/`	Observation-level flags for `k > 0.5` (only created when such rows exist).

If log_likelihood is missing, the function returns early with:

{"ok": true, "n_bad": 0, "computed": false, "reason": "no log_likelihood"}

Thresholds

Region	Condition	Interpretation
Good	`k <= 0.5`	Reliable PSIS-LOO approximation.
Marginal	`0.5 < k <= 0.7`	Caution; influential points may affect LOO stability.
Bad	`k > 0.7`	Unreliable approximation for those observations.

Constants in source:

K_GOOD = 0.5
K_MARGINAL = 0.7

Interpretation Guidance

ok = true means n_bad == 0 (no k > 0.7).
Large numbers of marginal/bad points indicate model misspecification, outliers, or weak likelihood support for certain observations.
elpd_loo and p_loo provide model-assessment context but should be interpreted alongside k diagnostics.

Usage Example

from src.diagnostics.pareto_k import run_pareto_k_diagnostic

k_report = run_pareto_k_diagnostic(
    model=driver.model,
    config=driver.config,
    results_dir=driver.results_dir,
)

if k_report.get("computed") and not k_report.get("ok"):
    print("Investigate flagged observations in pareto_k_flagged.csv")

Relationship to Workflow Stages and Gates

Stage: 50_diagnostics/.
Feeds Pareto-k gate g6 and LOO reliability interpretation.
In current workflow, this is advisory for downstream stages unless stricter policy is applied at governance level.