AMMM User Guide

Version: 2.5.1

Overview

AMMM is a Python library for building Bayesian marketing mix models using PyMC. It quantifies marketing channel effectiveness and optimises budget allocation.

Key Features:

Bayesian inference with uncertainty quantification
Saturation and carryover effect modelling
Budget optimization and scenario planning
Built-in diagnostics and validation

Installation

Requirements:

Python 3.10+
8GB RAM minimum (16GB recommended)

git clone https://github.com/tandpds/ammm.git
cd ammm
pip install -r requirements.txt

Verify:

from src.driver import MMMBaseDriverV2  # The driver class is exported at src.driver for convenience
print("AMMM installed successfully")

Quick Start

Minimal run (CLI):

python -u runme.py [--no-scenarios] [--scenarios "-20,-15,-10,-5,0,5,10,15,20"]

Advanced flow (Python):

from src.driver import MMMBaseDriverV2  # exported for convenience

driver = MMMBaseDriverV2(
    config_filename='demo/demo_config.yml',
    input_filename='demo/demo_data.csv',
    holidays_filename='demo/holidays.xlsx'
)

model = driver.fit_model()
r2 = driver.calculate_train_r_squared()
print(f"Model R²: {r2:.3f}")

driver.init_output()
driver.visualize()

Data Requirements

CSV Structure:

Column	Type	Description	Required
date	date	Date of observation (YYYY-MM-DD)	Yes
target	float	Target variable (sales/revenue)	Yes
media_channel_*	float	Spend or impressions	Yes
control_var_*	float	Control variables	No

Quality Requirements:

Minimum 52 observations (one year weekly data)
No missing values in critical columns
Positive values for revenue and media spend
Consistent time frequency (daily/weekly/monthly)

Configuration

Basic YAML structure:

raw_data_granularity: weekly
date_col: "date"
target_col: "revenue"

media:
  - display_name: "TV"
    impressions_col: "tv_impressions"
    spend_col: "tv_spend"

prophet:
  include_holidays: true
  holiday_country: 'US'
  yearly_seasonality: true
  trend: true

tune: 2000
draws: 2000
chains: 4
ad_stock_max_lag: 8
target_accept: 0.95
seed: 42

See Configuration Reference for all parameters.

Model Fitting

Standard fitting:

model = driver.fit_model()

Note: Data and configuration validation occurs during driver initialisation via internal preprocessing. Errors will raise with clear messages. Fit the model with:

model = driver.fit_model()

Predictions

Generate predictions:

predictions = driver.predict_on_test()
mean_pred = predictions.mean(axis=0)
lower_bound = np.percentile(predictions, 2.5, axis=0)
upper_bound = np.percentile(predictions, 97.5, axis=0)

Calculate performance:

r2 = driver.calculate_train_r_squared()

Budget Optimisation

Budget scenarios are produced by the pipeline when running via the CLI. Use --scenarios to specify percentage changes and review results/csv/budget_scenario_results.csv for allocations and impacts.

Example:

python -u runme.py --scenarios "-20,-10,0,10,20"

Then inspect the CSV described in the Output Schema for results.

Multi-Period Budget Optimization

NEW: Plan budgets across multiple time periods (e.g., 13 weeks, 12 months) with automatic seasonality adjustments.

Enable via CLI:

# 13-week planning with seasonality (default)
python runme.py --multiperiod

# Custom planning horizon (26 weeks)
python runme.py --multiperiod --multiperiod-weeks 26

# Without seasonality adjustments
python runme.py --multiperiod --no-seasonality

Via Python API:

import src as ammm

# After model fitting
ammm.optimize_marketing_budget(
    model=driver.model,
    data=driver.processed_data,
    config=driver.config,
    results_dir=driver.results_dir,
    multiperiod_mode=True,
    use_seasonality=True,
    n_time_periods=13,
    frequency='W'
)

Key Features:

Time-varying budget allocation based on expected channel effectiveness
Prophet seasonality integration (yearly + weekly patterns)
Seasonal effectiveness multipliers (1.0 = baseline, >1.0 = more effective, <1.0 = less effective)
Support for per-period budget constraints (min/max per period)
Backward compatible (existing code unchanged)

Outputs:

CSV: results/csv/multiperiod_optimization_results.csv
- Columns: period, period_date, channel, budget, contribution, seasonal_multiplier, roi
- Contains results for every period × channel combination
PNG Visualizations (5 files):
- multiperiod_budget_heatmap.png - Budget allocation heatmap across periods and channels
- multiperiod_contribution_over_time.png - Contribution trends with stacked area chart
- multiperiod_seasonal_patterns.png - Seasonal effectiveness multipliers by channel
- multiperiod_period_comparison.png - Side-by-side budget vs contribution comparison
- multiperiod_budget_vs_contribution.png - Dual-axis trends with ROI overlays

Advanced Options:

from src.driver.opt import optimize_multiperiod_budget

results_df = optimize_multiperiod_budget(
    model=driver.model,
    data=driver.processed_data,
    config=driver.config,
    results_dir='results',
    n_periods=13,
    total_budget=12_500_000,  # £12.5M across all periods
    use_seasonality=True,
    frequency='W',
    start_date='2025-01-06',
    period_budget_limits=(800_000, 1_200_000)  # £800K-£1.2M per week
)

When to Use:

Planning budgets for multiple weeks/months ahead
Business has significant seasonality (retail, travel, etc.)
Need to optimize across a planning horizon (Q1, full year)
Want to respect time-varying budget constraints

See Multi-Period Optimization Guide for detailed usage and examples.

Visualisation

Generate all plots:

driver.init_output()
driver.visualize()

Individual plots:

driver.plot_model_trace()
driver.plot_posterior_predictive()
driver.plot_components_contributions()
driver.plot_waterfall_components_decomposition()

Error Handling

Common errors:

# File not found
try:
    driver = MMMBaseDriverV2(config_filename='config.yml', ...)
except FileNotFoundError as e:
    print(f"Error: {e}")

# Data validation
try:
    driver = MMMBaseDriverV2(...)
except DataValidationError as e:
    print(f"Data issue: {e.details}")

# Model not fitted
try:
    results = driver.predict_on_test()
except ModelNotFittedError as e:
    driver.fit_model()

Troubleshooting

Convergence issues:

driver.config['tune'] = 2000
driver.config['target_accept'] = 0.99
model = driver.fit_model()

Memory issues:

driver.config['chains'] = 2
driver.config['draws'] = 500
model = driver.fit_model()

Cache health (optional): Use the cache monitor utilities to inspect or tidy PyTensor cache if you switch model shapes frequently.

from src.utils.cache_monitor import CacheMonitor

cm = CacheMonitor()
info = cm.get_cache_info()
print(info)

# Optimise or clear cache when needed
cm.optimize_cache()
cm.clear_cache(confirm=True)  # irreversible, deletes compiled functions

Advanced Usage

Load pre-fitted model:

model_path = 'saved_model.nc'
driver.model.save(model_path)

driver_new = MMMBaseDriverV2(...)
loaded_model = driver_new.fit_model(model_filename=model_path)

Batch processing:

configs = ['config1.yml', 'config2.yml', 'config3.yml']
results = []

for config_file in configs:
    driver = MMMBaseDriverV2(
        config_filename=config_file,
        input_filename='data.csv',
        holidays_filename='holidays.xlsx'
    )

    model = driver.fit_model()
    r2 = driver.calculate_train_r_squared()
    results.append({'config': config_file, 'r2': r2})

print(pd.DataFrame(results))

Running the Pipeline

You can run the full pipeline via the runner or convenience scripts:

CLI (recommended):

python -u runme.py [--no-scenarios] [--scenarios "-20,-15,-10,-5,0,5,10,15,20"]

Scripts:

Linux: ./run_pipeline_linux.sh [--no-scenarios] [--scenarios "-10,-5,0,5,10"]
Windows: run_pipeline_windows.bat [--no-scenarios] [--scenarios "-10,-5,0,5,10"]

Notes:

runme.py loads .env at startup before defaults, so environment variables set there apply to the entire run (LLM/JAX settings).
Use --no-scenarios to skip budget planning.

GPU Acceleration (JAX + NumPyro)

AMMM supports PyMC’s JAX backend for faster sampling on GPU.

Install CUDA-enabled JAX and NumPyro (CUDA 12 wheels):

pip install -U pip
pip uninstall -y jax jaxlib
pip install -U "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -U numpyro

Set env vars (preferably in .env):

JAX_PLATFORMS=cuda
AMMM_USE_JAX=1
AMMM_JAX_CHAIN_METHOD=vectorized
# Optional
XLA_PYTHON_CLIENT_PREALLOCATE=false
XLA_PYTHON_CLIENT_MEM_FRACTION=0.8

Verify GPU:

python -c "import jax; print(jax.devices())"  # Expect [CudaDevice(id=0)]

If JAX or numpyro is unavailable, AMMM automatically falls back to CPU pm.sample().

LLM-driven Reports

AMMM generates AI-powered business insights that focus on commercial value and statistical rigour.

Enable in YAML:

agentic_report: true

Set LLM provider in .env (OpenAI preferred if OPENAI_API_KEY is set):

# OpenAI
OPENAI_API_KEY=...
LLM_PROVIDER=openai
LLM_MODEL=gpt-5.2   # e.g., gpt-5.2, gpt-5, gpt-4o

# (Optional) Gemini
GEMINI_API_KEY=...
LLM_PROVIDER=gemini

Other useful settings:

LLM_MAX_TOKENS=8000
LLM_TEMPERATURE=0.3
LLM_REASONING_EFFORT=medium
LLM_ENABLE_CACHE=true
LLM_DAILY_LIMIT=10
LLM_MONTHLY_LIMIT=50

Features:

ROI-based Performance Classification
- Top 2 channels: “Top Performer”
- ROI > 1: “Solid Performer”
- Otherwise: “Review Needed”
- Performance tables display actual ROI values (from media_contribution_per_spend.csv)

Statistical Quality Flags

Channels are automatically flagged when statistical quality concerns are detected:

Flag	Threshold	Meaning	Action
High ROI	ROI ≥ 10	Potential selection bias or data sparsity	Verify channel targeting and spend levels
Wide Uncertainty	p95/p5 ≥ 5	Unreliable posterior estimates	Increase spend or collect more data
Low Spend	<1% of total budget	Inference unstable due to sparse data	Scale up or consolidate with similar channels
Selection Bias	Propensity-targeted channel	Endogeneity risk (targets high-propensity users)	Interpret ROI cautiously, consider incrementality tests

Propensity-targeted channels (automatically flagged for selection bias):

Retail media: amazon, retail_media
Remarketing: remarketing, google_rmkt, facebook_rmkt, display_rmkt
Owned channels: crm, email, mailer, shop_app

How flags appear in reports:

Channel Performance Summary table includes a “Caveats” column with short tags
Flagged channels show [!rank] markers next to their names
Detailed explanations appear in “Notes” section below the table

Commercial Insights Focus
- Recommendations emphasize commercial insights inferred from data
- Second-order effects highlighted (auction dynamics, spillovers, saturation, seasonality)
- Each recommendation tied to ROI bands, saturation levels, or contribution share
- Generic advice (e.g., “Implement A/B testing”) explicitly excluded
Executive-ready Outputs
- Technical report: results/markdown/ammm_report.md (evidence-backed diagnostics)
- Business report: results/markdown/business_report.md (executive summary with AI insights)
- Interpretations JSON: results/json/llm_interpretations.json (structured data for downstream use)

Policy:

Only two markdown reports generated per run (technical and business)
No additional markdown files created by LLM module
All claims in reports backed by model outputs with citations to source CSVs

Results Structure

Key outputs under results/:

markdown/ — ammm_report.md, business_report.md
json/ — llm_interpretations.json (if LLM enabled)
csv/ — summaries, diagnostics, performance metrics
model.nc — saved ArviZ InferenceData (NetCDF)
model.dill — optional full model object (binary)

Optional CSVs:

media_conversion_efficiency.csv and media_cost_per_conversion.csv are treated as optional. Missing files are not warned at high log levels and do not block report generation.

API Reference

Core modules:

src/driver - Exports MMMBaseDriverV2 (implementation in src/driver/base.py)
src/core/mmm_model_v2.py - Model implementation
src/prepro/ - Data preprocessing
src/sketch/ - Visualisation

See inline docstrings for detailed API documentation.

Support

Sample code: demo/ directory
Common issues: TROUBLESHOOTING.md
Bug reports: GitHub issues

License

See LICENSE.md.