Skip to content

AMMM Pipeline CSV Output Schema

This document describes all CSV files generated by the AMMM (Advanced Marketing Mix Modelling) pipeline in the results/csv/ directory.


Purpose: ROI-like metric for revenue-target models: total revenue contribution per unit spend for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,contribution_per_unit_spend,roi_rank

Column Definitions:

  • channel (string): Media channel name
  • total_spend (float): Total spend on channel
  • total_contribution (float): Total model-attributed revenue
  • contribution_per_unit_spend (float): Revenue per unit spend (contribution / spend)
  • roi_rank (int): Rank by contribution-per-spend (1 = highest)

Notes:

  • Produced when config.target_type is revenue.
  • For conversion-target models, see media_conversion_efficiency.csv.

Purpose: Cost per revenue unit (inverse of contribution-per-spend) for revenue-target models.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,cost_per_revenue_unit,cpr_rank

Column Definitions:

  • channel (string): Media channel name
  • total_spend (float): Total spend on channel
  • total_contribution (float): Total model-attributed revenue
  • cost_per_revenue_unit (float): Spend per unit of attributed revenue (spend / contribution)
  • cpr_rank (int): Rank by cost efficiency (1 = lowest cost per revenue unit)

Notes:

  • Produced when config.target_type is revenue.
  • For conversion-target models, see media_cost_per_conversion.csv.

The AMMM pipeline generates 13 CSV files across different phases:

PhaseFiles
Data Explorationstationarity_summary.csv, vif_summary.csv, transfer_entropy_summary.csv
Model Resultsmodel_summary.csv, ELPD_summary.csv
Performancemedia_performance_effect.csv, media_conversion_efficiency.csv, media_cost_per_conversion.csv, media_contribution_per_spend.csv, media_cost_per_revenue_unit.csv, response_curve_fit_combined.csv
Budget Optimizationbudget_scenario_results.csv
Decompositionall_decomp.csv, waterfall_decomposition_data.csv

Purpose: Tests for stationarity in time series data using Augmented Dickey-Fuller (ADF) tests.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

variable,adf_statistic,p_value,is_stationary,lags_used,n_observations

Column Definitions:

  • variable (string): Name of the time series variable (media channel, control, or target)
  • adf_statistic (float): Augmented Dickey-Fuller test statistic
  • p_value (float): P-value for the ADF test
  • is_stationary (boolean): Whether the series is stationary (True/False)
  • lags_used (int): Number of lags used in the ADF test
  • n_observations (int): Number of observations in the test

Use Cases:

  • Identify non-stationary variables that may require differencing or transformation
  • Validate that time series assumptions are met before modelling
  • Diagnose potential data quality issues

Example:

variable,adf_statistic,p_value,is_stationary,lags_used,n_observations
tv_spend,-3.21,0.019,True,3,104

Purpose: Variance Inflation Factor (VIF) analysis to detect multicollinearity between features.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

variable,vif,is_multicollinear

Column Definitions:

  • variable (string): Name of the feature (media channel or control variable)
  • vif (float): Variance Inflation Factor value
  • is_multicollinear (boolean): Whether VIF exceeds threshold (typically VIF > 10)

Use Cases:

  • Identify highly correlated features that may cause model instability
  • Guide feature engineering decisions
  • Validate model assumptions

Interpretation:

  • VIF = 1: No correlation
  • VIF < 5: Low correlation (acceptable)
  • VIF 5-10: Moderate correlation (caution)
  • VIF > 10: High multicollinearity (problematic)

Example:

variable,vif,is_multicollinear
search_spend,4.8,False

Purpose: Measures information transfer between variables using transfer entropy.

Generated During: Phase 4 - Data Exploration (Pre-diagnostics)

Structure:

source,target,transfer_entropy,is_significant

Column Definitions:

  • source (string): Source variable name
  • target (string): Target variable name (typically the dependent variable)
  • transfer_entropy (float): Transfer entropy value (bits)
  • is_significant (boolean): Whether the transfer is statistically significant

Use Cases:

  • Identify causal relationships between media channels and target
  • Understand information flow in the marketing system
  • Guide model structure decisions

Example:

source,target,transfer_entropy,is_significant
tv_spend,revenue,0.042,True

Purpose: Detailed summary of all fitted model parameters with posterior statistics.

Generated During: Phase 5 - Model Fitting (After MCMC sampling)

Structure:

parameter,mean,sd,hdi_5%,hdi_95%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat,median

Column Definitions:

  • parameter (string): Parameter name (intercept, beta_channel, alpha, lam, likelihood_sigma)
  • mean (float): Posterior mean estimate
  • sd (float): Posterior standard deviation
  • hdi_5% (float): 5th percentile of Highest Density Interval
  • hdi_95% (float): 95th percentile of Highest Density Interval
  • mcse_mean (float): Monte Carlo standard error of the mean
  • mcse_sd (float): Monte Carlo standard error of the standard deviation
  • ess_bulk (float): Bulk Effective Sample Size
  • ess_tail (float): Tail Effective Sample Size
  • r_hat (float): Gelman-Rubin convergence diagnostic (should be ≈ 1.0)
  • median (float): Posterior median estimate

Parameter Types:

  • intercept: Model intercept (baseline effect)
  • likelihood_sigma: Noise/error standard deviation
  • beta_channel[channel_name]: Channel effectiveness coefficient
  • alpha[channel_name]: Adstock retention parameter (0-1)
  • lam[channel_name]: Saturation steepness parameter

Use Cases:

  • Assess parameter convergence (check r_hat ≈ 1.0)
  • Evaluate parameter uncertainty (SD and HDI intervals)
  • Identify strongest media channels (high beta values)
  • Understand carryover effects (alpha values)
  • Export results for reporting

Example (excerpt):

parameter,mean,sd,hdi_5%,hdi_95%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat,median
beta_channel[tv_spend],0.012,0.003,0.007,0.018,0.0001,0.0001,2500,2200,1.00,0.012

Purpose: Expected Log Pointwise Predictive Density (ELPD) and model diagnostics.

Generated During: Phase 7 - Post-Analysis (Model Diagnostics)

Structure:

metric,value

Column Definitions:

  • metric (string): Name of the diagnostic metric
  • value (float/int/bool): Metric value

Metrics Included:

  • n_samples: Number of posterior samples used
  • n_data_points: Number of data points in the model
  • good_k: Proportion of good Pareto k values (should be > 0.7)
  • elpd_loo: Expected log pointwise predictive density (LOO-CV)
  • p_loo: Effective number of parameters
  • warning: Whether LOO diagnostic warnings were raised
  • r_squared: Model R-squared value

Use Cases:

  • Evaluate model fit quality
  • Compare different model specifications
  • Assess out-of-sample predictive accuracy
  • Identify overfitting (if p_loo >> actual parameters)

Example:

metric,value
elpd_loo,-1234.56

Purpose: Media channel effectiveness and contribution metrics.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,mean_effect,median_effect,sd_effect,hdi_5%,hdi_95%,total_contribution,pct_of_total

Column Definitions:

  • channel (string): Media channel name
  • mean_effect (float): Mean contribution per time period
  • median_effect (float): Median contribution per time period
  • sd_effect (float): Standard deviation of effect
  • hdi_5% (float): 5th percentile of HDI
  • hdi_95% (float): 95th percentile of HDI
  • total_contribution (float): Total contribution over entire period
  • pct_of_total (float): Percentage of total media contribution

Use Cases:

  • Rank channels by effectiveness
  • Calculate marketing ROI
  • Allocate budget across channels
  • Identify underperforming channels

Example:

channel,mean_effect,median_effect,sd_effect,hdi_5%,hdi_95%,total_contribution,pct_of_total
tv_spend,112.4,110.2,15.7,85.1,140.3,5845.0,0.27

Purpose: Conversion efficiency metrics for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,conversions_per_unit_spend,efficiency_rank

Column Definitions:

  • channel (string): Media channel name
  • total_spend (float): Total spend on channel
  • total_contribution (float): Total contribution (conversions/revenue)
  • conversions_per_unit_spend (float): Efficiency metric (contribution / spend)
  • efficiency_rank (int): Rank by efficiency (1 = most efficient)

Use Cases:

  • Identify most efficient channels
  • Optimize budget allocation
  • Calculate marketing efficiency ratios
  • Compare channel performance

Example:

channel,total_spend,total_contribution,conversions_per_unit_spend,efficiency_rank
social_spend,250000,620.5,0.00248,2

Purpose: Cost per conversion (CPA/CPO) for each media channel.

Generated During: Phase 7 - Post-Analysis (Performance Calculation)

Structure:

channel,total_spend,total_contribution,cost_per_conversion,cpa_rank

Column Definitions:

  • channel (string): Media channel name
  • total_spend (float): Total spend on channel
  • total_contribution (float): Total conversions/outcomes
  • cost_per_conversion (float): Cost per conversion (spend / contribution)
  • cpa_rank (int): Rank by CPA (1 = lowest CPA)

Use Cases:

  • Calculate CPA/CPO metrics
  • Compare channel efficiency
  • Set performance benchmarks
  • Identify cost-effective channels

Example:

channel,total_spend,total_contribution,cost_per_conversion,cpa_rank
search_spend,300000,900.0,333.33,1

Purpose: Fitted response curves for all media channels showing diminishing returns.

Generated During: Phase 5 - Model Fitting (Visualisation Phase)

Structure:

channel,spend_level,predicted_contribution,lower_bound,upper_bound,saturation_pct

Column Definitions:

  • channel (string): Media channel name
  • spend_level (float): Spend level (x-axis)
  • predicted_contribution (float): Predicted contribution at this spend level
  • lower_bound (float): Lower bound of prediction interval
  • upper_bound (float): Upper bound of prediction interval
  • saturation_pct (float): Percentage of maximum saturation reached

Use Cases:

  • Visualize diminishing returns
  • Find optimal spend levels
  • Identify saturation points
  • Guide budget recommendations

Note: Values represent direct response curves. No additional scaling is required.

Example:

channel,spend_level,predicted_contribution,lower_bound,upper_bound,saturation_pct
tv_spend,150000,210.5,180.2,240.8,64.2

Purpose: Results from budget scenario planning across different budget levels.

Generated During: Phase 8 - Budget Optimization

Structure:

scenario,description,channel,budget,contribution,pct_change_from_baseline,total_budget,pct_of_total

Column Definitions:

  • scenario (string): Scenario identifier (e.g., ‘baseline’, ‘scenario_-10’, ‘scenario_+20’)
  • description (string): Human-readable scenario description
  • channel (string): Media channel name or ‘TOTAL’ for aggregates
  • budget (float): Allocated budget for this channel in this scenario
  • contribution (float): Predicted contribution/conversions
  • pct_change_from_baseline (float): Percentage change from baseline scenario
  • total_budget (float): Total budget across all channels
  • pct_of_total (float): This channel’s percentage of total budget

Use Cases:

  • Compare different budget allocation strategies
  • Quantify impact of budget changes
  • Optimize budget distribution
  • Create “what-if” scenarios for planning

Scenario Types:

  • baseline: Current spend levels
  • scenario_-X: X% decrease in total budget
  • scenario_+X: X% increase in total budget

Example (excerpt):

scenario,description,channel,budget,contribution,pct_change_from_baseline,total_budget,pct_of_total
scenario_+10,+10% total budget,tv_spend,330000,980.2,6.4,1100000,0.30

Purpose: Complete time-series decomposition of target variable into components.

Generated During: Phase 7 - Post-Analysis (Decomposition)

Structure:

date,actual,predicted,baseline,media_total,channel_1,channel_2,...,control_1,control_2,...

Column Definitions:

  • date (datetime): Time period
  • actual (float): Actual observed target value
  • predicted (float): Model predicted value
  • baseline (float): Baseline contribution (intercept + trend)
  • media_total (float): Total media contribution across all channels
  • channel_name (float): Individual channel contribution (one column per channel)
  • control_name (float): Control variable contribution (one column per control)

Use Cases:

  • Understand contribution breakdown over time
  • Create waterfall charts
  • Validate model fit
  • Identify seasonal patterns
  • Generate attribution reports

Example (excerpt):

date,actual,predicted,baseline,media_total,tv_spend,search_spend,control_temp
2024-06-03,1200.0,1185.4,720.1,465.3,210.2,155.1,10.0

Purpose: Aggregated decomposition data for waterfall visualizations.

Generated During: Phase 7 - Post-Analysis (Visualization)

Structure:

component,contribution,component_type,order

Column Definitions:

  • component (string): Component name (baseline, channel name, or control variable)
  • contribution (float): Total contribution of this component
  • component_type (string): Type of component (‘baseline’, ‘media’, ‘control’)
  • order (int): Display order for waterfall chart

Use Cases:

  • Generate waterfall charts
  • Create attribution visualizations
  • Present decomposition results
  • Report marketing contribution

Example:

component,contribution,component_type,order
baseline,37520.0,baseline,0
tv_spend,11240.0,media,1

  • All CSV files use UTF-8 encoding
  • Numeric values use standard float representation
  • Boolean values are represented as True/False
  • Dates follow ISO 8601 format (YYYY-MM-DD) or inferred format from config
  • Missing numeric values may appear as NaN or empty strings
  • Missing categorical values appear as empty strings
  • Files are generated in the order of pipeline phases
  • Some files may not be generated if specific analyses are disabled
  • All files are overwritten on each pipeline run
  • Schema is valid for AMMM v2.x
  • V1 legacy results may have different schema (see results_legacy/)

  • USER_GUIDE.md: Complete pipeline usage guide
  • TROUBLESHOOTING.md: Common issues and solutions
  • file_organization.md: Project structure overview

Last updated: Oct 2025 AMMM Version: 2.5.1