AMMM Pipeline CSV Output Schema

This document describes all CSV files generated by the AMMM (Advanced Marketing Mix Modelling) pipeline in the results/csv/ directory.

Overview
Data Exploration & Diagnostics
Model Results
Performance Metrics
Budget Optimization
Decomposition & Attribution

media_contribution_per_spend.csv

Purpose: ROI-like metric for revenue-target models: total revenue contribution per unit spend for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,contribution_per_unit_spend,roi_rank

Column Definitions:

channel (string): Media channel name
total_spend (float): Total spend on channel
total_contribution (float): Total model-attributed revenue
contribution_per_unit_spend (float): Revenue per unit spend (contribution / spend)
roi_rank (int): Rank by contribution-per-spend (1 = highest)

Notes:

Produced when config.target_type is revenue.
For conversion-target models, see media_conversion_efficiency.csv.

media_cost_per_revenue_unit.csv

Purpose: Cost per revenue unit (inverse of contribution-per-spend) for revenue-target models.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,cost_per_revenue_unit,cpr_rank

Column Definitions:

channel (string): Media channel name
total_spend (float): Total spend on channel
total_contribution (float): Total model-attributed revenue
cost_per_revenue_unit (float): Spend per unit of attributed revenue (spend / contribution)
cpr_rank (int): Rank by cost efficiency (1 = lowest cost per revenue unit)

Notes:

Produced when config.target_type is revenue.
For conversion-target models, see media_cost_per_conversion.csv.

Overview

The AMMM pipeline generates 13 CSV files across different phases:

Phase	Files
Data Exploration	`stationarity_summary.csv`, `vif_summary.csv`, `transfer_entropy_summary.csv`
Model Results	`model_summary.csv`, `ELPD_summary.csv`
Performance	`media_performance_effect.csv`, `media_conversion_efficiency.csv`, `media_cost_per_conversion.csv`, `media_contribution_per_spend.csv`, `media_cost_per_revenue_unit.csv`, `response_curve_fit_combined.csv`
Budget Optimization	`budget_scenario_results.csv`
Decomposition	`all_decomp.csv`, `waterfall_decomposition_data.csv`

Data Exploration & Diagnostics

stationarity_summary.csv

Purpose: Tests for stationarity in time series data using Augmented Dickey-Fuller (ADF) tests.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

variable,adf_statistic,p_value,is_stationary,lags_used,n_observations

Column Definitions:

variable (string): Name of the time series variable (media channel, control, or target)
adf_statistic (float): Augmented Dickey-Fuller test statistic
p_value (float): P-value for the ADF test
is_stationary (boolean): Whether the series is stationary (True/False)
lags_used (int): Number of lags used in the ADF test
n_observations (int): Number of observations in the test

Use Cases:

Identify non-stationary variables that may require differencing or transformation
Validate that time series assumptions are met before modelling
Diagnose potential data quality issues

Example:

variable,adf_statistic,p_value,is_stationary,lags_used,n_observations
tv_spend,-3.21,0.019,True,3,104

vif_summary.csv

Purpose: Variance Inflation Factor (VIF) analysis to detect multicollinearity between features.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

variable,vif,is_multicollinear

Column Definitions:

variable (string): Name of the feature (media channel or control variable)
vif (float): Variance Inflation Factor value
is_multicollinear (boolean): Whether VIF exceeds threshold (typically VIF > 10)

Use Cases:

Identify highly correlated features that may cause model instability
Guide feature engineering decisions
Validate model assumptions

Interpretation:

VIF = 1: No correlation
VIF < 5: Low correlation (acceptable)
VIF 5-10: Moderate correlation (caution)
VIF > 10: High multicollinearity (problematic)

Example:

variable,vif,is_multicollinear
search_spend,4.8,False

transfer_entropy_summary.csv

Purpose: Measures information transfer between variables using transfer entropy.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

source,target,transfer_entropy,is_significant

Column Definitions:

source (string): Source variable name
target (string): Target variable name (typically the dependent variable)
transfer_entropy (float): Transfer entropy value (bits)
is_significant (boolean): Whether the transfer is statistically significant

Use Cases:

Identify causal relationships between media channels and target
Understand information flow in the marketing system
Guide model structure decisions

Example:

source,target,transfer_entropy,is_significant
tv_spend,revenue,0.042,True

Model Results

model_summary.csv

Purpose: Detailed summary of all fitted model parameters with posterior statistics.

Generated During: Phase 5 - Model Fitting (After MCMC sampling)

Structure:

parameter,mean,sd,hdi_5%,hdi_95%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat,median

Column Definitions:

parameter (string): Parameter name (intercept, beta_channel, alpha, lam, likelihood_sigma)
mean (float): Posterior mean estimate
sd (float): Posterior standard deviation
hdi_5% (float): 5th percentile of Highest Density Interval
hdi_95% (float): 95th percentile of Highest Density Interval
mcse_mean (float): Monte Carlo standard error of the mean
mcse_sd (float): Monte Carlo standard error of the standard deviation
ess_bulk (float): Bulk Effective Sample Size
ess_tail (float): Tail Effective Sample Size
r_hat (float): Gelman-Rubin convergence diagnostic (should be ≈ 1.0)
median (float): Posterior median estimate

Parameter Types:

intercept: Model intercept (baseline effect)
likelihood_sigma: Noise/error standard deviation
beta_channel[channel_name]: Channel effectiveness coefficient
alpha[channel_name]: Adstock retention parameter (0-1)
lam[channel_name]: Saturation steepness parameter

Use Cases:

Assess parameter convergence (check r_hat ≈ 1.0)
Evaluate parameter uncertainty (SD and HDI intervals)
Identify strongest media channels (high beta values)
Understand carryover effects (alpha values)
Export results for reporting

Example (excerpt):

parameter,mean,sd,hdi_5%,hdi_95%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat,median
beta_channel[tv_spend],0.012,0.003,0.007,0.018,0.0001,0.0001,2500,2200,1.00,0.012

ELPD_summary.csv

Purpose: Expected Log Pointwise Predictive Density (ELPD) and model diagnostics.

Generated During: Phase 7 - Post-Analysis (Model Diagnostics)

Structure:

metric,value

Column Definitions:

metric (string): Name of the diagnostic metric
value (float/int/bool): Metric value

Metrics Included:

n_samples: Number of posterior samples used
n_data_points: Number of data points in the model
good_k: Proportion of good Pareto k values (should be > 0.7)
elpd_loo: Expected log pointwise predictive density (LOO-CV)
p_loo: Effective number of parameters
warning: Whether LOO diagnostic warnings were raised
r_squared: Model R-squared value

Use Cases:

Evaluate model fit quality
Compare different model specifications
Assess out-of-sample predictive accuracy
Identify overfitting (if p_loo >> actual parameters)

Example:

metric,value
elpd_loo,-1234.56

Performance Metrics

media_performance_effect.csv

Purpose: Media channel effectiveness and contribution metrics.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,mean_effect,median_effect,sd_effect,hdi_5%,hdi_95%,total_contribution,pct_of_total

Column Definitions:

channel (string): Media channel name
mean_effect (float): Mean contribution per time period
median_effect (float): Median contribution per time period
sd_effect (float): Standard deviation of effect
hdi_5% (float): 5th percentile of HDI
hdi_95% (float): 95th percentile of HDI
total_contribution (float): Total contribution over entire period
pct_of_total (float): Percentage of total media contribution

Use Cases:

Rank channels by effectiveness
Calculate marketing ROI
Allocate budget across channels
Identify underperforming channels

Example:

channel,mean_effect,median_effect,sd_effect,hdi_5%,hdi_95%,total_contribution,pct_of_total
tv_spend,112.4,110.2,15.7,85.1,140.3,5845.0,0.27

media_conversion_efficiency.csv

Purpose: Conversion efficiency metrics for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,conversions_per_unit_spend,efficiency_rank

Column Definitions:

channel (string): Media channel name
total_spend (float): Total spend on channel
total_contribution (float): Total contribution (conversions/revenue)
conversions_per_unit_spend (float): Efficiency metric (contribution / spend)
efficiency_rank (int): Rank by efficiency (1 = most efficient)

Use Cases:

Identify most efficient channels
Optimize budget allocation
Calculate marketing efficiency ratios
Compare channel performance

Example:

channel,total_spend,total_contribution,conversions_per_unit_spend,efficiency_rank
social_spend,250000,620.5,0.00248,2

media_cost_per_conversion.csv

Purpose: Cost per conversion (CPA/CPO) for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,cost_per_conversion,cpa_rank

Column Definitions:

channel (string): Media channel name
total_spend (float): Total spend on channel
total_contribution (float): Total conversions/outcomes
cost_per_conversion (float): Cost per conversion (spend / contribution)
cpa_rank (int): Rank by CPA (1 = lowest CPA)

Use Cases:

Calculate CPA/CPO metrics
Compare channel efficiency
Set performance benchmarks
Identify cost-effective channels

Example:

channel,total_spend,total_contribution,cost_per_conversion,cpa_rank
search_spend,300000,900.0,333.33,1

response_curve_fit_combined.csv

Purpose: Fitted response curves for all media channels showing diminishing returns.

Generated During: Phase 5 - Model Fitting (Visualisation Phase)

Structure:

channel,spend_level,predicted_contribution,lower_bound,upper_bound,saturation_pct

Column Definitions:

channel (string): Media channel name
spend_level (float): Spend level (x-axis)
predicted_contribution (float): Predicted contribution at this spend level
lower_bound (float): Lower bound of prediction interval
upper_bound (float): Upper bound of prediction interval
saturation_pct (float): Percentage of maximum saturation reached

Use Cases:

Visualize diminishing returns
Find optimal spend levels
Identify saturation points
Guide budget recommendations

Note: Values represent direct response curves. No additional scaling is required.

Example:

channel,spend_level,predicted_contribution,lower_bound,upper_bound,saturation_pct
tv_spend,150000,210.5,180.2,240.8,64.2

Budget Optimization

budget_scenario_results.csv

Purpose: Results from budget scenario planning across different budget levels.

Generated During: Phase 8 - Budget Optimization

Structure:

scenario,description,channel,budget,contribution,pct_change_from_baseline,total_budget,pct_of_total

Column Definitions:

scenario (string): Scenario identifier (e.g., ‘baseline’, ‘scenario_-10’, ‘scenario_+20’)
description (string): Human-readable scenario description
channel (string): Media channel name or ‘TOTAL’ for aggregates
budget (float): Allocated budget for this channel in this scenario
contribution (float): Predicted contribution/conversions
pct_change_from_baseline (float): Percentage change from baseline scenario
total_budget (float): Total budget across all channels
pct_of_total (float): This channel’s percentage of total budget

Use Cases:

Compare different budget allocation strategies
Quantify impact of budget changes
Optimize budget distribution
Create “what-if” scenarios for planning

Scenario Types:

baseline: Current spend levels
scenario_-X: X% decrease in total budget
scenario_+X: X% increase in total budget

Example (excerpt):

scenario,description,channel,budget,contribution,pct_change_from_baseline,total_budget,pct_of_total
scenario_+10,+10% total budget,tv_spend,330000,980.2,6.4,1100000,0.30

Decomposition & Attribution

all_decomp.csv

Purpose: Complete time-series decomposition of target variable into components.

Generated During: Phase 7 - Post-Analysis (Decomposition)

Structure:

date,actual,predicted,baseline,media_total,channel_1,channel_2,...,control_1,control_2,...

Column Definitions:

date (datetime): Time period
actual (float): Actual observed target value
predicted (float): Model predicted value
baseline (float): Baseline contribution (intercept + trend)
media_total (float): Total media contribution across all channels
channel_name (float): Individual channel contribution (one column per channel)
control_name (float): Control variable contribution (one column per control)

Use Cases:

Understand contribution breakdown over time
Create waterfall charts
Validate model fit
Identify seasonal patterns
Generate attribution reports

Example (excerpt):

date,actual,predicted,baseline,media_total,tv_spend,search_spend,control_temp
2024-06-03,1200.0,1185.4,720.1,465.3,210.2,155.1,10.0

waterfall_decomposition_data.csv

Purpose: Aggregated decomposition data for waterfall visualizations.

Generated During: Phase 7 - Post-Analysis (Visualization)

Structure:

component,contribution,component_type,order

Column Definitions:

component (string): Component name (baseline, channel name, or control variable)
contribution (float): Total contribution of this component
component_type (string): Type of component (‘baseline’, ‘media’, ‘control’)
order (int): Display order for waterfall chart

Use Cases:

Generate waterfall charts
Create attribution visualizations
Present decomposition results
Report marketing contribution

Example:

component,contribution,component_type,order
baseline,37520.0,baseline,0
tv_spend,11240.0,media,1

Notes

Data Types

All CSV files use UTF-8 encoding
Numeric values use standard float representation
Boolean values are represented as True/False
Dates follow ISO 8601 format (YYYY-MM-DD) or inferred format from config

Missing Values

Missing numeric values may appear as NaN or empty strings
Missing categorical values appear as empty strings

File Generation

Files are generated in the order of pipeline phases
Some files may not be generated if specific analyses are disabled
All files are overwritten on each pipeline run

Version Compatibility

Schema is valid for AMMM v2.x
V1 legacy results may have different schema (see results_legacy/)

USER_GUIDE.md: Complete pipeline usage guide
TROUBLESHOOTING.md: Common issues and solutions
file_organization.md: Project structure overview

Last updated: Oct 2025 AMMM Version: 2.5.1

AMMM Pipeline CSV Output Schema

Table of Contents

media_contribution_per_spend.csv

media_cost_per_revenue_unit.csv

Overview

Data Exploration & Diagnostics

stationarity_summary.csv

vif_summary.csv

transfer_entropy_summary.csv

Model Results

model_summary.csv

ELPD_summary.csv

Performance Metrics

media_performance_effect.csv

media_conversion_efficiency.csv

media_cost_per_conversion.csv

response_curve_fit_combined.csv

Budget Optimization

budget_scenario_results.csv

Decomposition & Attribution

all_decomp.csv

waterfall_decomposition_data.csv

Notes

Data Types

Missing Values

File Generation

Version Compatibility

Related Documentation