Skip to content

Pre-Diagnostics (`src.diagnostics.pre_diagnostics`)

pre_diagnostics.py runs statistical checks before posterior interpretation: stationarity on the target, multicollinearity checks on regressors, and pairwise transfer entropy screening. Outputs are written to stage 10_pre_diagnostics/.

from src.diagnostics.pre_diagnostics import (
run_stationarity_tests,
run_vif_tests,
run_transfer_entropy,
run_all_pre_diagnostics,
)
run_stationarity_tests(
data: pd.DataFrame,
date_col: str,
cols: list[str],
*,
kpss_regression: str = "c",
kpss_nlags: str | int | None = "auto",
adf_maxlag: int | None = None,
adf_regression: str = "c",
dropna: bool = True,
) -> pd.DataFrame
run_vif_tests(
data: pd.DataFrame,
cols: list[str],
*,
include_constant: bool = True,
dropna: str = "pairwise",
) -> pd.DataFrame
run_transfer_entropy(
data: pd.DataFrame,
date_col: str,
x_cols: list[str],
y_col: str,
*,
max_lag: int = 1,
bins: int = 8,
permutations: int = 200,
random_state: int = 42,
normalize: bool = True,
dropna: bool = True,
) -> pd.DataFrame
run_all_pre_diagnostics(
data: pd.DataFrame,
config: dict[str, Any],
results_dir: str,
*,
stationarity_cols: list[str] | None = None,
vif_cols: list[str] | None = None,
te_x_cols: list[str] | None = None,
te_y_col: str | None = None,
te_include_controls_in_x: bool = False,
stationarity_kwargs: dict[str, Any] | None = None,
vif_kwargs: dict[str, Any] | None = None,
te_kwargs: dict[str, Any] | None = None,
) -> dict[str, str]
ParameterDescription
dataProcessed modelling dataframe.
configPipeline configuration; used to infer date_col, target_col, media spend_cols, and extra_features_cols.
results_dirRun root directory; save_csv(...) routes into 10_pre_diagnostics/.
stationarity_colsOptional override for ADF/KPSS variables. Default is target only.
vif_colsOptional override for VIF variables. Default is media + controls.
te_x_cols, te_y_colOptional overrides for transfer-entropy direction setup.
te_include_controls_in_xAdds controls to TE X-set when True.
*_kwargsExtra keyword arguments passed to each underlying diagnostic function.
FilenameStage folderDescription
stationarity_summary.csv10_pre_diagnostics/ADF + KPSS metrics and combined stationarity conclusion by variable.
vif_summary.csv10_pre_diagnostics/VIF/tolerance/correlation summary with high-VIF flagging.
transfer_entropy_summary.csv10_pre_diagnostics/Pairwise TE(X→Y), TE(Y→X), permutation p-values, and direction label.
ADF resultKPSS resultConclusion
Reject H0 (p < 0.05)Fail to reject H0 (p >= 0.05)Likely stationary
Fail to reject H0 (p >= 0.05)Reject H0 (p < 0.05)Likely unit root
Other combinationsOther combinationsInconclusive

Remediation for likely unit-root behaviour:

  • First differencing.
  • Detrending.
  • Log transform for multiplicative trends.
VIFSeverityAction
< 5Low collinearityUsually no action.
5 to < 10Moderate collinearityMonitor and stress-test estimates.
>= 10High collinearityCombine/remove regressors, or add stronger regularisation structure.

3. Transfer Entropy (Pairwise, Unconditional)

Section titled “3. Transfer Entropy (Pairwise, Unconditional)”
ConditionDirectionInterpretation
TE(X→Y) significant and stronger than TE(Y→X)x→yX may contain predictive information for Y.
TE(Y→X) significant and stronger than TE(X→Y)y→xReverse predictive direction may dominate.
Both significantbidirectionalMutual predictive relationship.
Neither significantnoneNo strong directional signal.

Important caveat:

  • This implementation is pairwise TE and does not control for confounders.
  • Use as an exploratory diagnostic, not a causal identification claim.
import pandas as pd
from src.diagnostics.pre_diagnostics import run_all_pre_diagnostics
# Example only: in production, the V2 driver prepares the processed dataframe.
data = pd.read_csv("processed_input.csv")
config = {
"date_col": "DATE",
"target_col": "revenue",
"media": [
{"display_name": "TV", "spend_col": "tv_spend"},
{"display_name": "Search", "spend_col": "search_spend"},
],
"extra_features_cols": ["price_index", "competitor_sales"],
}
paths = run_all_pre_diagnostics(
data=data,
config=config,
results_dir="results/run_20260304_101500",
)
print(paths)

Pre-diagnostics are orchestrated through the V2 driver workflow (MMMBaseDriverV2 -> WorkflowExecutor) rather than relying on a standalone script entrypoint.

from src.driver.base import MMMBaseDriverV2
driver = MMMBaseDriverV2(
config_filename="data-config/demo_config.yml",
input_filename="data-config/demo_data.csv",
holidays_filename="data-config/holidays.csv",
results_filename="results",
)
driver.main()
  • Stage: 10_pre_diagnostics/.
  • These checks are pre-fit quality diagnostics that support gate g1 interpretation readiness and reduce downstream model risk.
  • They are advisory diagnostics; hard machine-readable gate states are emitted later by convergence/calibration diagnostics in 50_diagnostics/.