Skip to content

Explanation: Core MMM Methodology

Version: 2.5.1

Marketing Mix Modelling (MMM) quantifies the impact of marketing activities on sales or other KPIs. AMMM provides a Bayesian framework for building flexible, interpretable MMM models.

Bayesian Approach: AMMM uses PyMC to treat model parameters as probability distributions, providing uncertainty quantification rather than point estimates. This yields credible intervals (HDIs) and enables incorporation of prior knowledge.

Flexibility: Customizable media transformations (adstock, saturation), control variables, and prior distributions allow models to reflect business reality.

Interpretability: Model outputs include channel-specific coefficients, response curves, ROI estimates, and contribution decomposition.

Actionability: Results directly inform budget optimization and scenario planning.

Parameters as Distributions: Model parameters (channel effectiveness, adstock rates, saturation points) are probability distributions reflecting uncertainty about their true values.

Prior Distributions: Express initial beliefs about parameters before observing data. Common choices:

  • HalfNormal: For positive-only parameters (effectiveness coefficients, error terms)
  • Beta: For parameters between 0-1 (adstock retention rates)
  • Gamma: For positive parameters with specific shapes (saturation parameters)
  • Normal: For unconstrained parameters

Likelihood Function: Quantifies how probable the observed data is given parameter values.

Posterior Distributions: Updated beliefs about parameters after observing data, obtained via Bayes’ Theorem:

P(parameter | Data) ∝ P(Data | parameter) × P(parameter)

MCMC Sampling: PyMC uses NUTS (No-U-Turn Sampler) to draw samples from posterior distributions, which are typically too complex for analytical solutions.

  • Credible intervals (HDIs) for all parameters
  • Full posterior distributions for detailed inspection
  • Direct probability statements about parameters
  • Formal incorporation of prior knowledge

The typical AMMM model structure is a Bayesian regression with transformed media inputs and optional control variables:

y_t = α + Σ_m β_m · saturation(adstock(x_{m,t})) + Σ_c γ_c · z_{c,t} + ε_t

It’s also common to define a shorthand baseline term:

baseline_t := α + Σ_c γ_c · z_{c,t}
y_t = baseline_t + Σ_m β_m · saturation(adstock(x_{m,t})) + ε_t

Target Variable (y_t): The outcome being modelled (sales, revenue, conversions).

Intercept (α): Fundamental base level.

Control Variables (z_c,t): Baseline drivers not attributable to paid media (and any engineered time features). Coefficients γ_c are estimated in the Bayesian model. Common examples:

  • External factors (promotions, competitor activity, macro indicators)
  • Prophet-derived time components when enabled (trend/seasonality/holidays)

Media Inputs (x_m,t): Raw marketing effort (spend, impressions, GRPs).

Media Transformations:

  1. Adstock (Carry-over Effects): Models lagged advertising impact

    • Geometric adstock: adstocked_t = (1-θ) × input_t + θ × adstocked_{t-1}
    • Parameter θ (retention rate): 0-1, typically has Beta prior
    • ad_stock_max_lag: Maximum periods for carry-over calculation
  2. Saturation (Diminishing Returns): Models non-linear response

    • Michaelis-Menten: Hyperbolic response curve
    • Logistic: S-shaped response curve
    • Parameters control shape and steepness, typically Gamma or HalfNormal priors

Channel Effectiveness (β_m): Scales transformed media input. Represents marginal impact on target. Typically HalfNormal prior (positive effect).

Error Term (ε_t): Random variation not explained by model. Typically Normal(0, σ_error) with HalfNormal prior on σ.

Priors guide model estimation and reflect domain knowledge:

Channel Effectiveness (β_m):

  • HalfNormal: For positive effects
  • Inform based on past studies, domain expertise, or lift tests

Adstock (θ_m):

  • Beta distribution (0-1 constraint)
  • Digital channels: Lower values (shorter memory)
  • Traditional media: Higher values (longer memory)

Saturation (λ_m):

  • Gamma or HalfNormal
  • Inform using lift test results when available
  • Otherwise use weakly informative priors

Best Practices:

  • Start with weakly informative priors
  • Visualize priors before fitting
  • Test sensitivity to prior choices
  • Document reasoning

MCMC Process:

  1. Initialize parameter values
  2. Run multiple independent chains (typically 4)
  3. Tuning phase: Sampler adapts (typically 2000 iterations)
  4. Sampling phase: Collect posterior samples (typically 2000+ draws)

Key Parameters:

  • draws: Posterior samples per chain
  • tune: Tuning iterations
  • chains: Number of independent chains
  • target_accept: Acceptance rate (0.8-0.99)

Convergence Diagnostics:

  • R-hat: Should be ≈1.0 (< 1.01 ideal)

    • Compares within-chain vs between-chain variance
    • Values > 1.05 indicate non-convergence
  • Effective Sample Size (ESS): Should be > 100-400

    • Accounts for autocorrelation
    • Low ESS indicates inefficient sampling
  • Trace Plots: Should show “fuzzy caterpillar”

    • Horizontal band (stationarity)
    • Good mixing between chains
    • No trends or patterns
  • Divergences: Should be zero or minimal

    • Indicate sampler instability
    • Fix by increasing target_accept

Channel Coefficients (β_m):

  • Magnitude indicates effectiveness
  • HDI indicates uncertainty
  • If HDI excludes zero, strong evidence of effect

Response Curves:

  • Visualize diminishing returns
  • Identify optimal spend levels
  • Compare channel dynamics

ROI Metrics:

  • Overall ROI: Total contribution / total spend
  • Marginal ROI (mROI): Return on next dollar spent
  • Use mROI for budget allocation decisions

Contribution Analysis:

  • Decomposes target into components
  • Shows baseline vs marketing impact
  • Tracks contribution over time

Budget Optimization:

  • Allocates budget to maximize returns
  • Uses response curves and mROI
  • Supports constraints on channel spend

Scenario Planning:

  • Tests “what-if” scenarios
  • Predicts outcomes under different budgets
  • Provides uncertainty intervals

AMMM can use Prophet to generate deterministic time features (trend/seasonality/holidays) that are then included as control variables in the Bayesian MMM.

In the current pipeline this is a two-stage (“hybrid”) setup:

  1. Fit a Prophet model to the target time series to obtain components like trend, yearly, weekly, holidays.
  2. Add those components as columns in the training data and include them in extra_features_cols, so they enter the PyMC model via the control term Σ_c γ_c · z_{c,t}.

The Bayesian MMM (channel effects, adstock, saturation, control coefficients) is then estimated conditional on those Prophet-derived features.

Configuration:

  • yearly_seasonality: Annual patterns (all data frequencies)
  • weekly_seasonality: Day-of-week patterns (for daily or sub-daily data)
  • daily_seasonality: Hour-of-day patterns (for sub-daily/intra-day data)
  • trend: Long-term trends
  • include_holidays: Holiday effects

Data Frequency Guidance:

  • Weekly data: Use yearly_seasonality=True, set weekly_seasonality=False and daily_seasonality=False
  • Daily data: Use yearly_seasonality=True and weekly_seasonality=True, set daily_seasonality=False
  • Intra-day data: Use all three seasonality parameters as needed

Prophet components are automatically added to the model as control variables when enabled.

If convergence fails:

  1. Increase tune and draws
  2. Increase target_accept (0.95-0.99)
  3. Use more informative priors
  4. Simplify model (fewer channels/features)
  5. Check data quality

For large datasets:

  • Use more chains and draws
  • Increase computational resources
  • Monitor memory usage

For complex models:

  • Start simple, add complexity gradually
  • Validate at each step
  • Document changes

See also: