Prior Predictive Check (`src.diagnostics.prior_predictive`)

Overview

run_prior_predictive_check samples from the prior predictive distribution and checks whether simulated target values are plausible against observed data scale. Outputs are written to 10_pre_diagnostics/.

Function Signature

from src.diagnostics.prior_predictive import run_prior_predictive_check

run_prior_predictive_check(
    model: Any,
    X_train: pd.DataFrame,
    y_train: pd.Series | np.ndarray,
    config: dict[str, Any],
    results_dir: str,
    samples: int = 500,
) -> dict[str, Any]

Parameters

Parameter	Description
`model`	ModelBuilder-compatible model with `sample_prior_predictive(...)`.
`X_train`	Training feature matrix used for prior predictive simulation.
`y_train`	Observed training target values (reference distribution).
`config`	Run configuration dictionary.
`results_dir`	Run root directory; outputs are saved in `10_pre_diagnostics/`.
`samples`	Number of prior predictive draws (default `500`).

Artefacts Produced

Filename	Stage folder	Description
`prior_predictive_summary.csv`	`10_pre_diagnostics/`	Summary stats for prior predictive vs observed (`mean`, `sd`, `min`, `max`, `p05`, `p95`).
`prior_predictive_check.png`	`10_pre_diagnostics/`	Prior predictive histogram with observed mean and observed range overlay.

Plausibility Metric

The key metric is:

plausibility_ratio = fraction of prior draws inside [obs_min - 2*obs_sd, obs_max + 2*obs_sd].

Current warning rule in code:

warning = (plausibility_ratio < 0.5)

Interpretation:

>= 0.5: priors generate data broadly consistent with observed scale.
< 0.5: priors may be too diffuse, too concentrated, or mis-centred for the current target.

Scale Handling

If target standardisation is applied in-graph, the function attempts inverse-transform before summarising and plotting prior predictive draws.

Usage Example

from src.diagnostics.prior_predictive import run_prior_predictive_check

pp_result = run_prior_predictive_check(
    model=driver.model,
    X_train=driver.X_train,
    y_train=driver.y_train,
    config=driver.config,
    results_dir=driver.results_dir,
    samples=500,
)

print(pp_result["plausibility_ratio"], pp_result["warning"])

Relationship to Workflow Stages and Gates

Stage: 10_pre_diagnostics/.
Feeds prior predictive gate g1 before posterior interpretation.
A warning does not automatically terminate execution by itself, but it indicates prior revision should be considered before relying on downstream decisions.