The Large Weather Demand Model

Weather-Driven Demand Intelligence

WeatherVane is a demand intelligence system that tells you what to do about the weather—and proves why. It captures the $65 billion in annual demand shifts that every ad platform on earth treats as noise. At its core is the Large Weather–Demand Model—a foundation model that fuses the transformer architectures and scaling laws behind the LLM revolution with causal econometrics, hierarchical Bayesian inference, and real-time weather physics. The result is a private demand superintelligence for your brand: it ingests millions of location–day observations, learns nonlinear dose–response curves, interaction effects, and lag structures across every product, market, and channel—then reallocates your budget automatically with a causal estimate, a confidence interval, and a kill-switch on every action. It doesn’t guess. It identifies the causal mechanism by which weather drives your customers’ behavior, using the same econometric strategy that has powered 83 papers in top-5 economics journals since 1928.

Humidifier
Your humidifier brand
+
Freezing Statue of Liberty covered in ice
32°F·18% humidity
This weekend in NYC
=
Marketing leader thinking
You / every marketer
“Cold and dry this weekend—sales should go up.
But how cold is cold enough? And how dry?..
Exactly how much do I spend? On which channels?
And is this the best place to be spending my money?
What about geos I haven’t had time to check weather for?”

WeatherVane answers this—for every product, in every market, down to the dollar and the degree. Even the geos you thought you had to ignore.

Under the hood: a multi-layer causal model trained on weather, ad spend, sales, inventory, and calendar signals across thousands of markets. Below: the problem, the solution, the science, the architecture, the safety, and the vision.

Request Early AccessJump to API Reference
Summary

WeatherVane automatically adjusts your ad spend when weather changes demand—so you stop wasting money on rainy days and capture surges when the sun comes out.

What you get: Daily, per-market recommendations—“increase Google spend on sunscreen +41% in Denver this week, shift Meta budget from Phoenix to Seattle”—with confidence scores and automatic execution. How it works: A causal demand model uses weather shocks as natural experiments to separate real ad effects from noise, with hierarchical partial pooling across tenants and closed-loop Model Predictive Control. Why weather: It’s the only large-scale, exogenous, continuously varying signal that moves consumer demand but is orthogonal to marketing decisions. Status: v1.0 shipping to early design partners; v2.0 deep learning in development.

This document serves three audiences: business leaders (Sections 1–2, 7), technical evaluators (Sections 3–4, 6), and hands-on engineers (Section 5: The Architecture, plus the API Reference). Estimated reading time: 25 minutes.
sunco.weathervane.ai/recommendations
Simulated data for illustration
SunCoSunCoby WeatherVane
Dashboard
Recommendations
Products
Markets
Reports
Settings
SunCo Sunscreen · 5 markets
Today's Recommendations
Feb 10, 2026 · 5 markets · 6 recommendations
● Recommendations ready
Weekly ad budget
$50K
5 markets
Active recs
6
All actionable
Projected lift
+$18.2K
vs naive baseline
Interactions modeled
14
UV×lag×geo×channel
Projected lift by market
Phoenix, AZ
+$1.8K
pull back
±22% CI
Denver, CO
+$3.6K
event boost
±11% CI
Seattle, WA
+$5.2K
σ‑anomaly
±18% CI
Miami, FL
+$1.5K
UV taper
±14% CI
Austin, TX
+$6.1K
channel shift
±16% CI
Product
Market
Action
Signal
Conf.
SPF 50 Beach Lotion
non-obvious
Phoenix, AZ
−18% Google
demand/°F inverts above 105°F
ΔT +14° vs 7d avg · saturation zone
108°F · UV 12
94%
After-Sun Aloe Gel
non-obvious
Phoenix, AZ
+52% all channels
24h UV lag → peak burn-care window
EMA₅(UV) ↑ trending · CDD₇ = 246°F·days
108°F · UV 12 · lag(UV, 24h)
92%
Sport SPF 70 Spray
Denver, CO
+41% event-aware
temp × Trail Race = 2.3× interaction
humidity 18% (demand amplifier)
94°F · UV 11 · Trail Race
93%
Zinc Face Shield
non-obvious
Seattle, WA
+67% surge bid
σ +2.8 vs DMA norm → peak elasticity
MA₇(temp) deviation +19° from local baseline
78°F · UV 7
89%
Daily Glow SPF 30
non-obvious
Miami, FL
−31% taper
d/dt(UV) < 0 for 3 days → spend-off signal
inventory velocity ↑ lag · pre-purchased
95°F · UV 10 · ΔUV −2/day
86%
SPF 50 Beach Lotion
non-obvious
Austin, TX
Shift G→Meta
ROASG 0.8× vs ROASM 2.1× at margin
channel × geo saturation · CPC $4.12
91°F · Google CPC saturated
78%
Portfolio decision · 6 recommendations across 5 markets
Pull back Phoenix saturated spend, surge Seattle \u03C3-anomaly window, reallocate Austin Google\u2192Meta. Projected weekly lift: +$18.2K (90% CI: $12.4K–$24.1K) vs weather-naive baseline
Approve & Execute

1. The Problem

You’re spending more on ads every quarter and trusting the numbers less. The platforms say your campaigns are working. Your CFO says prove it.

If you sell sunscreen, outdoor furniture, beverages, HVAC, apparel, or anything else where weather moves the needle, you already feel this: a sunny weekend explodes demand and your bids don’t move fast enough. A cold snap kills conversions and you’re still spending yesterday’s budget. You can see the weather in your dashboards but you can’t act on it—because your ad platform doesn’t know what weather does to your category, your attribution model can’t separate weather-driven sales from ad-driven sales, and your rules (“boost when it’s hot”) collapse the moment reality gets complicated. Meanwhile, CPCs are up 12% this year, iOS killed half your tracking, and every dollar you spend is harder to justify to the people who sign the checks.

U.S. digital ad spend will reach $413 billion in 2026. Roughly $65 billion of that flows to weather-sensitive categories (roughly 15% of total digital ad spend, per eMarketer/Statista vertical breakdowns): sunscreen, HVAC, beverages, apparel, outdoor recreation, home comfort. Yet every major ad platform—Google, Meta, TikTok, Amazon—treats weather as noise to be filtered out, not signal to be exploited. The standard approach is to apply crude rules (“if temperature > 90°F, boost sunscreen ads”) that encode roughly 1.6 bits of information and collapse immediately under the weight of real-world complexity: nonlinear response curves, interaction effects (UV × humidity × cloud cover), spatial heterogeneity across markets, temporal lags between weather events and purchase behavior, and channel-specific saturation dynamics.

The core challenge is causal identification. Observing a correlation between temperature and sunscreen sales is trivial. Estimating the causal effect of a weather event on demand—net of seasonality, promotions, stockouts, and ad spend itself—requires causal identification strategies, not correlation-based feature importance. Modern methods like Double/Debiased ML (Chernozhukov et al., 2018) and Causal Forests (Wager & Athey, 2018) combine econometric identification with ML’s flexibility—using machine learning for flexible estimation while maintaining rigorous causal guarantees.

WeatherVane solves this by treating weather events as natural experiments—non-manipulable exogenous shocks that enable causal estimation without randomization. The Large Weather Demand Model (LWDM) combines this causal identification strategy with hierarchical partial pooling across the tenant network, saturation-aware budget optimization, and a “Plan & Proof” delivery mode that ships evidence bundles with every recommendation.

Weather signals
TemperatureUV indexHumidityPrecipitationWind speedCloud coverPressureDew pointVisibility
×
Temporal transforms
Lags t−₁..t₂₆Moving avg 3/7/14/30dΔ/Δt derivativesCum. degree‐daysExp. decay αVolatility σ
×
Sales & demand
Daily revenueUnits soldConversion rateCart sizeReturn rateInventory
×
Ad & marketing
Google spendMeta spendTikTok spendImpressionsCPC / CPMROASCreative fatigueBid landscape
×
Structure & dynamics
1,000 DMAs52 weeks50+ categories6 channels135 interactionsAdstock decaySaturation curves
Learnable parameters
~270M
420 base dims × 5K+ cross-domain interactions × 1K markets × 50 categories
A 3-rule system covers
0.000016%
of this space

1.1 The Attribution Breakdown

+222%
CAC increase since 2013
75%
iOS users opted out of tracking
5-10x
overstatement in ad measurement

The old playbook—track users across sites, attribute conversions to clicks, optimize ROAS in-platform—is breaking down on every front. Brands that relied on deterministic, user-level attribution are now flying blind, and the platforms that sold them certainty are raising prices while delivering less signal.

  • iOS ATT destroyed the signal graph. Apple’s App Tracking Transparency framework gave users an explicit opt-in prompt. Opt-in rates settled around 25–35% (Flurry Analytics, 2022), meaning 65–75% of iOS users became invisible to ad platforms overnight. Meta alone reported a $10 billion annual revenue impact.
  • Third-party cookie deprecation compounds the damage. Safari and Firefox already block third-party cookies by default (~40% of browser traffic). The era of deterministic, user-level cross-site tracking is ending.
  • CPC inflation is accelerating. Average Google Ads CPC rose 10% YoY in 2023 and another 12% in 2024. Meta CPMs climbed 15–20% annually. Brands are paying more per click while conversion attribution degrades.
  • Ad fraud drains 20–35% of digital spend. Juniper Research (2023) estimated $84 billion in global ad fraud. Invalid traffic inflates reported impressions, making platform-reported ROAS systematically overstated.
  • Walled gardens control the measurement. Google, Meta, Amazon, and TikTok each report their own attribution using their own models. Cross-platform deduplication is impossible. Nielsen lost MRC accreditation in 2021.

When user-level tracking fails, you need market-level causal signals that are immune to privacy restrictions. Weather is the paradigmatic example: exogenous, universal, granular to the zip-code level, and requires zero PII. No cookies. No device IDs. No consent prompts. It cannot be blocked by ATT, invalidated by cookie deprecation, or gamed by click farms.

1.2 The Endogeneity Problem: Why Every Existing MMM Gets It Wrong

Robyn says your ads drove 5x ROAS. The randomized experiment says 0.5x. The model was measuring its own reflection—brands spend more when they expect demand to be high, creating a spurious correlation that ridge regression (Robyn), Bayesian priors (Meridian, PyMC), and hyperparameter search cannot fix. This is not a bug in those tools. It is a gap in their architecture: advertising spend is endogenous, and no amount of regularization addresses the omitted-variable bias that contaminates causal estimates.

Gordon et al. (2019) ran 15 large-scale randomized experiments at Facebook RCT—500 million user-observations, 1.6 billion impressions—and found that observational estimates overstate true ad effects by 5–10× at the median. Blake, Nosko & Tadelis (2015) ran a field experiment at eBay RCT: branded search advertising had near-zero causal effect on sales despite appearing highly effective in standard attribution. Shapiro, Hitsch & Tuchman (2021) found that over 80% of brands have negative marginal ROI on TV. These are not edge cases; they are the central tendency.

Without an identification strategy, every marketing mix model is measuring its own reflection — not the world.

5-10xoverstatement in observational ad measurement vs. RCTsGordon, Zettelmeyer, Bhatt & Goldfarb (2019)
Naive: “Our attribution model shows 5x ROAS. The ads are working.”
Actual: Randomized experiments show 0.5x. The model was measuring its own reflection—brands spend more when they expect demand to be high.

1.3 Competitive Landscape

Existing approaches fall into a few buckets: pure intuition (seasonal playbooks, gut-feel adjustments), basic regressions (OLS time-series without instruments—biased by construction), weather-triggered advertising (threshold reflexes with ~1.6 bits of weather state), and marketing mix models (estimation frameworks without identification strategy—producing systematically biased coefficients). None of them possess weather intelligence or causal rigor. They are separated from WeatherVane not by degree but by kind.

WEATHER INTELLIGENCENoneDeepCAUSAL RIGORCorrelationalIdentifiedLow Weather / High CausalHigh Weather / High CausalLow Weather / Low CausalHigh Weather / Low CausalPure Intuition“It looks cold, boost something”Basic Regressionsy = mx + b, best of luckAd platform AIOptimizes CTR, not causationMeta RobynRidge regression, no IVPyMC MarketingBayesian priors, no identificationGoogle MeridianGeo experiments (costly, infrequent)The Weather CompanyWeather data without demand intelligenceWeatherAds.ioThreshold awareness, not weather intelligenceWeatherVane LWDMWeather IVs + continuous causal ID
Why not Pure Intuition?
Checking the weather app and adjusting spend feels productive—but a controller with ~0 bits of variety against a 25+ bit system is not partially optimizing. It is making arbitrary moves that occasionally align with the correct action by coincidence. The demand surface has 270 million interacting parameters across weather, sales, ad, inventory, and structural domains. Noticing that it is hot outside does not give you a foothold on that surface—it gives you the illusion of one.
Why not Basic Regressions (OLS)?
Running sales ~ temperature in R is the first thing every analyst tries. The problem: ad spend is endogenous (you spend more when you expect high demand), so the coefficient on spend is biased upward—often by 5–10x. Without an instrument, you’re measuring your own feedback loop. Regressions also assume linearity: at 108°F, demand doesn’t just plateau—it inverts. OLS can’t see that.
Why not WeatherAds / The Weather Company?
WeatherAds.io made weather-triggered advertising a product category. But triggers do not answer "when" to act—they answer "when a number crosses a line," which is not the same thing. Whether 85°F is the right moment to increase spend depends on the saturation curve, the auction equilibrium, the temporal derivative, the lag structure, and hundreds of other variables the trigger cannot see. A trigger fires identically in states that require opposite actions. That is not timing—it is a coin flip with a weather-themed interface.
Why not Robyn / Meridian / PyMC Marketing?
Robyn, Meridian, and PyMC Marketing represent significant engineering investments. But engineering effort does not substitute for identification strategy. Without instruments, every coefficient in these models is confounded by the endogeneity they are trying to measure. The outputs are not "approximately right"—they are systematically biased, often by 5–10× (Gordon et al., 2019). A biased estimate is worse than no estimate, because it creates false confidence.

The table below breaks this down feature by feature. The key insight: weather-triggered systems (WeatherAds, The Weather Company) detect when a number crosses a threshold—but the demand-relevant state depends on interactions, lags, derivatives, and auction dynamics that thresholds cannot encode. Knowing “temperature > 85” is not knowing when conditions have changed in a decision-relevant way. Marketing mix models (Robyn, Meridian, PyMC) have estimation machinery but no identification strategy, which means their estimates are systematically biased—not approximately right with a missing feature.

FEATURE COMPARISON
How WeatherVane compares
FeaturePure IntuitionBasic RegressionsWeatherAds / TWCRobyn / Meridian / PyMCLWDM (WeatherVane)
ApproachYou checked the news todayy = α + βx + ε, pray the residuals are normalThreshold reflexes, contextual ad placementRidge / Bayesian estimation (no identification)Causal IV + synthetic control
Weather integrationLooked out the window this morningMaybe a temperature column if the intern rememberedThreshold triggers (~1.6 bits of weather state)None or manual covariates270M learned params across all signal domains
Causal identification"Seems like we sell more candles when it’s cold"None (confounds everything with everything)NoneNone (priors regularize, not identify; geo experiments are costly one-offs)Weather shocks as IVs
Cross-brand learningAsked a friend at a conferenceNoneNoneNoneEmpirical Bayes pooling
Nonlinear effectsNot a concept that has come upNo (linear by definition—it’s in the name)NoFunctional form only (curve shape biased without identification)Full (interactions, lags, phase transitions)
Geo granularityWhatever was in the news headlinesNational / regionalZip-level thresholds (same 1.6-bit logic per zip, no cross-market learning)National / regionalDMA / zip-level causal
Real-time optimizationCheck back MondayQuarterly re-run if someone asksReactive threshold firing (not optimization)NoMPC closed-loop
Confidence intervalsConfidence, yes. Intervals, no.Nominal (incorrect coverage; biased estimate ± false precision)NoNominal (posterior conditional on misidentified model)Yes + kill-switches

1.4 Why Now

Weather anomalies are accelerating—and so is the opportunity for brands that harness them. The last decade has produced more billion-dollar weather events, more volatile consumer behavior, and wider demand-forecast errors than any comparable period. At the same time, the tooling to actually solve this problem has finally matured. The result is an expanding opportunity: the value of weather-aware demand modeling is growing while the cost of building a solution has collapsed. Fildes, Ma, & Kolassa (2022), in the definitive survey of retail forecasting research and practice, identify weather as an important but under-exploited external signal—widely acknowledged to matter, yet rarely incorporated with methodological rigor.

Source: NOAA NCEI / Climate Central
Billion-Dollar Weather Events (US)
3020100
8
14
10
15
15
16
16
17
14
18
14
19
22
20
20
21
18
22
28
23
27
24
23
25
Events/yearTotal cost ($B)2020–2025: more events, higher costs
The opportunity in weather
Demand Precision Is the New Edge
Billion-dollar weather disasters per year, US (NOAA NCEI)
2015: 8/yr2025: 28/yr
↑ 3.5×
Search volume spikes, weather-sensitive categories (Google Trends, 2020–2024)
2015: 1.2× avg2025: 3–6× avg
↑ 4×
Retail inventory distortion, US (IHL Group)
2015: $0.8T/yr2025: $1.1T+/yr
↑ est.
Demand forecast error during weather anomalies (NRF / Planalytics)
2015: 8–12%2025: 20–35%
↑ 3×
Every year without a weather-aware demand model, the gap widens. The brands that move first capture the arbitrage; the rest absorb the losses.
Weather APIs
Sub-km, sub-hour forecasts now free (NOAA HRRR, Open-Meteo)
Ad Platform APIs
Real-time geo spend data from Google, Meta, TikTok
Causal ML
DoWhy, EconML now engineering-ready
  • Weather anomalies are escalating: NOAA recorded 28 billion-dollar weather events in 2023 alone—more than any year on record. Consumer demand patterns are increasingly volatile and unpredictable during extreme weather, creating both risk and opportunity for brands that can respond in real time.
  • Weather is an untapped signal in demand models: Standard MMM and ad platform algorithms treat weather as noise. During a 2024 heatwave, weather-aware models reduced forecast error to 8–12% versus the 25–40% seen by standard approaches. The upside of incorporating this signal compounds: better inventory timing, optimized spend allocation, and captured revenue windows.
  • The tooling window just opened: Hyperlocal weather APIs (NOAA HRRR, Open-Meteo), real-time ad spend data (Google/Meta conversion APIs), and production-grade causal ML (DoWhy, EconML) all matured between 2022 and 2025. For the first time, it is possible to build a causal demand model at scale without research-lab infrastructure.

1.5 The Academic Evidence Base

The premise that weather causally affects consumer demand is not an assumption—it is one of the most empirically validated findings in applied economics. Dell, Jones, & Olken (2014) provide the canonical survey of weather-as-instrument research across economics, demonstrating that weather variation satisfies the exclusion restriction required for causal identification in dozens of domains from agriculture to retail to labor supply. Weather is exogenous to human decisions, varies at fine spatiotemporal scales, and is measurable with high precision—properties that make it uniquely powerful as a source of identifying variation.

Busse et al. (2015) study over 40 million vehicle transactions across the United States and show that weather at the time of purchase causally affects which car consumers buy—convertible sales spike in sunshine, 4WD sales spike after snowfall—even when the weather is transient and climatically irrelevant to the buyer’s location. This finding is striking because automobiles are high-consideration purchases with long research cycles, yet weather-induced psychological salience still moves the needle. For low-consideration, weather-sensitive categories like sunscreen and beverages, the effects are larger and more immediate.

Most recently, Roth Tran (2023) constructs a machine-learning weather index from daily store-level retail sales data and finds that weather explains 2–5% of daily sales variance even after controlling for seasonality, day-of-week, and store fixed effects. At the tail—during extreme weather events—the effect can exceed 20% of daily revenue for weather-sensitive categories. These findings form the empirical foundation for the LWDM: the signal is real, causal, and large enough to be economically significant at the portfolio level.

2. The Solution

Here is exactly what you get.

Every morning, WeatherVane delivers a set of decisions: which markets to increase, which to pull back, by how much, on which channels—with a causal estimate, a confidence interval, and the conditions under which the recommendation reverses. Weather moves $65 billion in annual demand, and the model captures the full surface of that signal—not the <0.001% a rule system covers.

2.1 Weather as Natural Experiment

The econometric solution to endogeneity is instrumental variables—finding a source of variation that shifts demand but is uncorrelated with the error term. Philip G. Wright used weather as the first instrument in the history of econometrics (1928), estimating demand elasticities for agricultural commodities using regional rainfall. Nearly a century later, Dell, Jones & Olken (2014) survey 83 papers in top-5 economics journals that use weather as exogenous variation for causal identification. Weather is arguably the most widely validated instrument in all of economics because it satisfies the core requirements by construction: advertisers cannot cause the weather, daily weather realizations are not anticipated by budget cycles set weeks in advance, and weather measurably shifts consumer demand across dozens of product categories.

The LWDM uses weather shocks—deviations from seasonal norms—as continuous natural experiments. Every cold snap, heat wave, and rainstorm that departs from expectations provides exogenous demand variation that helps separate weather-driven sales from ad-driven sales. This is not weather-triggered ad targeting; it is weather-driven causal identification—a fundamentally different capability.

2.2 Continuous Identification vs. One-Shot Experiments

Google’s Meridian recommends geo-holdout experiments for what it calls causal calibration. Each experiment costs real revenue: a brand withholding ads in half the U.S. for 21 days forfeits roughly $289K in incremental revenue on a $10M monthly base (SegmentStream, 2024), takes 4–8 weeks, and produces a single noisy point estimate for a single channel. Most brands run one or two per year. Weather-shock identification inverts this tradeoff: it provides thousands of natural experiments per year across every geography, at zero cost, for every channel simultaneously. The signal accumulates passively with every day of data (Lewis & Rao, 2015).

1,000:1natural experiments per year vs. geo-holdout testsWeather shocks provide continuous identification at zero cost

2.3 The Partial Pooling Advantage

Robyn, Meridian, and PyMC Marketing all build one model per advertiser—each of which is individually misidentified due to the absence of instruments. Even setting aside the identification problem, James & Stein (1961) proved something that startled the statistics community: estimating three or more quantities separately is always worse than estimating them together. Always. THEOREM Efron & Morris (1975) demonstrated it using baseball batting averages—a 71% error reduction PEER-REVIEWED. The LWDM implements adaptive empirical Bayes shrinkage: new brands with 30 days of data inherit informative priors from hundreds of similar tenants; mature brands with two years of data retain their own estimates with minimal shrinkage. A brand joining the platform gets better estimates on day one than it would after six months alone. The math is settled (James & Stein, 1961; Efron & Morris, 1975). The question is not whether to pool—it is how aggressively (Gelman & Hill, 2006).

Sources: Gordon et al. (2019), Marketing Science; Blake, Nosko & Tadelis (2015), Econometrica; Dell, Jones & Olken (2014), J. Econ. Lit.; James & Stein (1961); Efron & Morris (1975), JASA; Gelman & Hill (2006); Shapiro, Hitsch & Tuchman (2021), Econometrica.

$65Bannual weather-driven demand shifts in the USeMarketer / Statista vertical breakdowns, ~15% of total digital ad spend

2.4 What the Model Actually Does

Every row in this dashboard encodes a decision that no rule system can make. The model sees saturation curves, temporal derivatives, geo-relative deviations, and cross-channel substitution effects—simultaneously, for every market.

sunco.weathervane.ai/recommendations
Simulated data for illustration
SunCoSunCoby WeatherVane
Dashboard
Recommendations
Products
Markets
Reports
Settings
SunCo Sunscreen · 5 markets
Today's Recommendations
Feb 10, 2026 · 5 markets · 6 recommendations
● Recommendations ready
Weekly ad budget
$50K
5 markets
Active recs
6
All actionable
Projected lift
+$18.2K
vs naive baseline
Interactions modeled
14
UV×lag×geo×channel
Projected lift by market
Phoenix, AZ
+$1.8K
pull back
±22% CI
Denver, CO
+$3.6K
event boost
±11% CI
Seattle, WA
+$5.2K
σ‑anomaly
±18% CI
Miami, FL
+$1.5K
UV taper
±14% CI
Austin, TX
+$6.1K
channel shift
±16% CI
Product
Market
Action
Signal
Conf.
SPF 50 Beach Lotion
non-obvious
Phoenix, AZ
−18% Google
demand/°F inverts above 105°F
ΔT +14° vs 7d avg · saturation zone
108°F · UV 12
94%
After-Sun Aloe Gel
non-obvious
Phoenix, AZ
+52% all channels
24h UV lag → peak burn-care window
EMA₅(UV) ↑ trending · CDD₇ = 246°F·days
108°F · UV 12 · lag(UV, 24h)
92%
Sport SPF 70 Spray
Denver, CO
+41% event-aware
temp × Trail Race = 2.3× interaction
humidity 18% (demand amplifier)
94°F · UV 11 · Trail Race
93%
Zinc Face Shield
non-obvious
Seattle, WA
+67% surge bid
σ +2.8 vs DMA norm → peak elasticity
MA₇(temp) deviation +19° from local baseline
78°F · UV 7
89%
Daily Glow SPF 30
non-obvious
Miami, FL
−31% taper
d/dt(UV) < 0 for 3 days → spend-off signal
inventory velocity ↑ lag · pre-purchased
95°F · UV 10 · ΔUV −2/day
86%
SPF 50 Beach Lotion
non-obvious
Austin, TX
Shift G→Meta
ROASG 0.8× vs ROASM 2.1× at margin
channel × geo saturation · CPC $4.12
91°F · Google CPC saturated
78%
Portfolio decision · 6 recommendations across 5 markets
Pull back Phoenix saturated spend, surge Seattle \u03C3-anomaly window, reallocate Austin Google\u2192Meta. Projected weekly lift: +$18.2K (90% CI: $12.4K–$24.1K) vs weather-naive baseline
Approve & Execute
Saturation curve
Hot day → boost sunscreen
At 108°F in Phoenix, your competitors double their sunscreen spend. The model cuts 18% — because demand already inverted and every dollar past the peak is wasted.
Temporal lag
UV is high → boost sunscreen
UV falling at −2 pts/day means consumers already bought. After-sun demand peaks 24h after the UV spike. Without tracking d/dt(UV), you spend into yesterday’s demand.
Relative weather
108°F is hotter than 78°F
78°F in Seattle is +2.8σ above its DMA norm. Phoenix at 108°F is only +0.5σ. A rule that treats 108 > 78 as meaningful moves money in the wrong direction.
Rate of change
UV is 9 → spend more
UV falling 2 consecutive days = consumers already stocked up. Without this, you bid against held inventory — paying for impressions that cannot convert.
Channel × geo
Spread budget evenly
Google CPC $4.12 in Phoenix, ROAS 0.8× vs Meta at 2.1×. An even split forfeits 40% of the marginal return the optimizer captures in real time.

Above 105°F, people stop going outside. The demand curve doesn’t plateau — it inverts.

Weather-driven demand is not linear, not monotonic, and not intuitive. A hot, humid day drives different behavior than a hot, dry day. Three consecutive hot days create fatigue effects that a single hot day does not. And some of the highest-value opportunities are counterintuitive—like sunscreen demand spiking during warm rain (diffuse UV burns through cloud cover). The model captures all of this. Rules capture none of it.

2.5 Think Your Rules Can Capture This? Try.

Build up to 6 rules from 10 templates—temperature thresholds, UV boosts, geo overrides—then run a 7-day simulation across 8 markets. WeatherVane’s model runs the same scenario with continuous optimization. See where your hand-crafted strategy diverges from what a model with full weather awareness can capture.

Rule Builder1/6 rules
🌡️Temperature Threshold
If temp > 85°F, boost by 60%
Construct up to 6 rules and compare your allocation strategy against WeatherVane’s model. No finite set of multiplicative rules can match a model that evaluates the full demand surface.

2.6 Evidence Bundles: How You Know You Can Trust This

Every recommendation ships with a full evidence bundle: the backtest, the confidence interval, the break conditions, and the confounder audit. Here is what a single recommendation looks like:

EXAMPLE: SUNCO SUNSCREEN — PHOENIX DMA, WEEK OF JULY 14
RECOMMENDATION
+22% Google spend
CAUSAL ESTIMATE
Δ Revenue: +$18.2K
CONFIDENCE INTERVAL
[+$12.1K, +$24.8K] 90% CI
BREAK CONDITION
Reverses if temp < 98°F for 3+ days
CONFOUNDERS CHECKED
8/8 passed (no promo, no stockout)
BACKTEST MAPE
7.2% on 12-month holdout

2.7 Closed-Loop Control

The model doesn’t fire-and-forget. It observes outcomes, updates beliefs, and re-optimizes every cycle. If Tuesday’s forecast was wrong, Wednesday’s allocation adjusts automatically. This is Model Predictive Control (MPC)—the same framework that guides autonomous vehicles and chemical process plants. Details in Section 5.

The gap between rules and reality is not a gap you can close by adding more rules. It is a gap of architecture.

See what it would do with your data

3. The Science

In 1928, Philip Wright solved the oldest problem in economics using rainfall and butter prices. Nearly a century later, 83 papers in top-5 economics journals have used the same trick. WeatherVane applies it to your ad budget.

Weather is the oldest and most validated causal instrument in econometrics—and the key to solving the attribution problem. What follows is the full scientific argument: why weather works as an instrument, what makes the problem fundamentally hard, and how the LWDM navigates both.

3.1 Why This Problem Is Fundamentally Hard

The math proves that no set of human-written rules can capture the full complexity of weather–demand interactions. Weather’s effect on demand is like chess: you cannot skip ahead; you have to evaluate each position. Adding more rules is like adding epicycles to a geocentric solar system—each buys a marginal improvement in a framework that is architecturally wrong.

Three independent lines of research converge on the same conclusion about weather-demand systems. Poincaré (1890) proved that even the gravitational three-body problem—a system with perfect physical equations—is generically unsolvable in closed form: the interactions generate trajectories that cannot be compressed into any formula shorter than a full step-by-step simulation. Wolfram (2002) generalized this as computational irreducibility: certain systems admit no shortcut. You cannot skip ahead; you must run the computation. And Ashby (1956) proved the control-theoretic consequence: any controller that governs such a system must possess at least as much internal variety as the system itself. A thermostat cannot control the weather. A 6-rule marketing playbook cannot navigate a demand surface with hundreds of millions of interacting parameters across weather, sales, ad, inventory, and structural domains.

Weather-driven demand is exactly such a system. We know the upstream physics perfectly—Navier-Stokes, radiative transfer, Coriolis effects, moisture thermodynamics. Yet weather itself cannot be predicted beyond ~10 days because the interactions generate sensitivity to initial conditions that doubles prediction error every 2–3 days (Lorenz, 1963). Consumer demand sits downstreamof this already-irreducible system, further modulated by psychology, inventory dynamics, competitive response, channel saturation, and geographic heterogeneity. If the upstream system is already irreducible, the downstream system is irreducible a fortiori. A 6-rule system captures <0.001% of the parameter space. The missing 99.999% is not rounding error—it is the difference between guessing and knowing.

A different way to see this: a rule is a hand-coded lossy compression scheme. It takes a high-dimensional, continuously varying weather state and throws away almost everything, keeping a tiny code—hot, cold, rainy—that maps to an action. The question isn’t whether that compression is simple (simplicity is a virtue). The question is whether it preserves the right information. In rate-distortion theory (Shannon, 1948; Tishby et al., 2000), the optimal compression depends on what you’re trying to predict. Rules compress for legibility—they keep what fits in a Slack message. A model compresses for decision quality—it keeps what changes the optimal allocation of the next dollar.

HIGH-DIMweather state3-BIT BOTTLENECKhot / cold / rainCRUDE ACTIONboost / cut / holdRulesHIGH-DIMweather stateSTRUCTUREDdecision-relevantNUANCED ACTIONcontinuous allocationModel
Naive: “Rules compress weather into 3 categories. A marketer can hold this in working memory.”
Actual: The model compresses weather into whatever structure maximizes decision value. A marketer can’t hold this—but the math doesn’t care.
270Mlearnable parameters across weather, sales, ad, and structural interaction space. A 3-rule system covers <0.001%.See dimensionality specification
Naive: Write rules for 8 categories — “if hot, boost sunscreen; if cold, boost outerwear”
Actual: 270M cross-domain interactions make rules mathematically hopeless. You need a model with matching internal complexity.
THE MODEL — WHAT REQUISITE VARIETY LOOKS LIKE
The Rules-Based Approach
SUNCO™ Weather-Triggered Media Rules
Rule 1: TEMPERATURE TRIGGER IF daily_high > 85°F THEN increase_spend("Sunscreen", +20%) CHANNEL: Google Shopping, Meta Feed Rule 2: UV INDEX TRIGGER IF uv_index >= 8 THEN boost_creative("SPF50+ Pro Line") FREQUENCY_CAP: 3x/user/day Rule 3: RAIN SUPPRESSION IF precipitation_probability > 60% THEN pause_campaign("Outdoor_Summer") RESUME: next day if clear
In plain English:

“If the temperature goes above 85°F, increase sunscreen spend by 20% on Google Shopping and Meta. If the UV index hits 8 or higher, switch the creative to SPF 50+, capped at three impressions per person per day. If there’s more than a 60% chance of rain, pause the outdoor campaign and turn it back on tomorrow if it clears.”

The Modeling Approach
E[δπit | do(ait), ℱt−1] = ∫𝒲𝓓c=1Cτ=0L βci(g) · fc(wt−τ; θc) · Satc(ai,t−τ; λc, δc) × σ(γc Xit) ⊗ Ψ(wt, wt−1, …, wt−L) · 𝒲(w) · 𝓓(d) + ∑j≠i ηij(g) · ⟨∇w E[djt], ∂wt/∂wt−1⟩ ⊗ RijΓ(gμν) + ∫0T KMatérn(t, s; ν, ℓ) · [ ∑c ρc(s) · ∇w fc(ws) ⊗ ∇a Satc(ais) ] ds + ∑j=1J∂Ωj ⟨∇w φ(w, d), dμνsm⟩ ⊗ Γkij(g) · exp(−½ Δg Λ−1 Δg) + Φ(Invit, Pcompjt, Calt) · det(I + ε · Ric[gμν])−½ + ε̃it(g) ~ GP(0, κ((Δg, Δt); ν, ℓ))  +  …  +  O(ε²)
In plain English:

“The expected counterfactual incremental profit under a do-calculus intervention, conditioned on the natural filtration of the demand-weather σ-algebra, is computed via posterior predictive integration over the Riemannian manifold of heteroskedasticity-robust, endogeneity-corrected causal effect surfaces (Chernozhukov et al., 2018), where each fiber of the response bundle is equipped with a Matérn-52 covariance kernel calibrated to the empirical Bayes hyperprior derived from the cross-tenant shrinkage operator (James & Stein, 1961; Efron & Morris, 1975), subject to the constraint that the adstock saturation transform satisfies the Bellman optimality condition on the MPC planning horizon with discount factor γ ∈ (0.94, 0.99) under the Radon-Nikodým measure change induced by the geo-hierarchical random-effects structure, whose sufficient statistics are computed via a contraction on the Ricci flow of the demand-weather coupling tensor evaluated at the …”

The rules fit on a napkin. The demand system doesn’t.

Figure. Progressive complexity: click through stages to see how adding each weather variable (temperature, humidity, wind, temporal lags, cross-category effects) makes the demand surface progressively more tangled until no set of threshold rules can approximate it. This is computational irreducibility made visible.
THE IRREDUCIBILITY GAP
15–30%
of optimal weather-demand responses require the opposite action from what intuition suggests
<0.001%
of the parameter space covered by even the most sophisticated hand-written rules
270M
cross-domain interaction parameters that must be estimated—not guessed
THE ADVERTISING CRISIS — WHY WEATHER MATTERS NOW
PLATFORM SIDE
$201BMeta revenue 2025 (+22% YoY)
$400B+Google revenue 2025
ADVERTISER SIDE
+222%CAC increase since 2013 (ProfitWell)
-$29loss per new ecommerce customer (Martech/ShipBob)
75%of iOS users opted out of tracking (Flurry Analytics)

In a signal-starved world, weather is the last universal, ungated, causally relevant demand signal.

Universal
Affects every category, every geography. No opt-in required.
Privacy-safe
Zero PII. No cookies, no device IDs. Immune to ATT and GDPR.
Causally identified
Weather shocks are natural experiments. No confounding with bid dynamics.
Ungated
Not owned by Google, Meta, or Amazon. Public data.

3.2 The Complexity Gap: Dimensionality Mismatch

A thermostat can control a room. But a thermostat cannot control a weather system. The reason is not engineering—it is information. The room has one variable (temperature). The weather has billions. A controller must have at least as much internal complexity as the system it controls (Ashby, 1956). This is not a guideline—it is a mathematical constraint. A 6-rule system simply cannot capture the behavior of a system with millions of interacting variables.

Now look at the dimensionality of weather-driven demand:

Specification — Dimensionality Analysis
The demand function for a single market-category pair:

  y_{m,c,t} = f(W_{m,t−L:t}, S_{m,c,t}, A_{m,c,t}, Z_{m,c,t}) + ε_{m,c,t}

  W = weather state    (10 base variables: temp, UV, humidity, wind, precip, ...)
  S = sales & demand   (revenue, units, conversion, cart, returns, inventory)
  A = ad & marketing   (spend × 5 channels, impressions, CPM, CTR, ROAS,
                         creative fatigue, bid landscape, frequency)
  Z = structure        (geo features, seasonality, category, channel mix,
                         adstock decay, saturation curves, competitive signals)

  LWDM ingests all of these out of the box—weather, sales, ad, inventory,
  location, and time signals—no manual feature engineering required.

Signal dimensions across all domains:

  Markets (m):         ~1,000   (US DMAs + international; v4.0: 100K+ global)
  Categories (c):      ~50      (sunscreen, outerwear, beverages, home comfort, ...)

  Weather signals:     10 base × 20 temporal transforms              = 200 dims
    Temporal transforms per variable (×20):
      Raw lags:          t, t−1, ..., t−6          =  7  (MPC horizon)
      Moving averages:   3d, 7d, 14d, 30d          =  4  (smoothing windows)
      Derivatives:       Δ(1d), Δ(7d)               =  2  (rate of change)
      Cumulative:        degree-days, running sums  =  2  (accumulated exposure)
      Exp. smoothing:    α=0.1, 0.3, 0.7            =  3  (adaptive decay)
      Volatility:        σ(7d), σ(30d)               =  2  (weather variance)
    Plus non-obvious temporal dynamics:
      Regime detection:  structural breaks, phase transitions
      Threshold memory:  duration above/below critical levels
      Recovery dynamics: normalization rate after extremes
      Spatial-temporal:  weather moves geographically (lead-lag across markets)
      Cyclical harmonics: diurnal, weekly, monthly, annual
      Event-time:        days since last rain, days until holiday, paycheck cycles

  Sales & demand:      6 signals × 10 temporal transforms            =  60 dims
    (revenue, units, conversion, cart size, returns, inventory—each with
     lags, velocity, acceleration, seasonal baselines, trend decomposition)

  Ad & marketing:      8 signals × 15 features (per-channel detail)  = 120 dims
    (spend, impressions, CPM/CPC, CTR, ROAS, creative fatigue, bid landscape,
     frequency—per channel, with adstock decay and saturation structure)

  Structure:           7 signals × ~6 transforms                     =  40 dims
    (geo demographics, seasonality harmonics, category effects, channel mix,
     adstock decay curves, saturation shapes, competitive signals)
                                                                     ─────────
  Base features per market-category:                                   420 dims

  Cross-domain interactions—where the combinatorial explosion lives:
    Weather × Weather:    C(200,2) reduced to ~135 structured pairs
    Weather × Ad:         200 × 120                       = 24,000 pairs
    Weather × Sales:      200 × 60                        = 12,000 pairs
    Weather × Structure:  200 × 40                        =  8,000 pairs
    Ad × Sales:           120 × 60                        =  7,200 pairs
    Ad × Structure:       120 × 40                        =  4,800 pairs
    Sales × Structure:     60 × 40                        =  2,400 pairs
    Triple interactions:  weather × ad × sales, ...        = 100K+ terms
                                                          ──────────────
    Raw cross-domain pairs:                               ~58,500+
    After low-rank / sparsity structure:                  ~5,000–10,000 effective

  Effective feature dims per market-category: 420 + 5,000 ≈  5,400
  Raw parameter count: 1,000 × 50 × 5,400       ≈  270M weights
  With cross-market partial pooling:          ~10⁷ effective parameters
  At v4.0 global scale (100K+ markets):       ~10⁸–10⁹ effective parameters

Capacity of "if temp > 90, boost sunscreen":
  One threshold + one action + one channel    =  ~log₂(3) ≈ 1.6 bits
  Gap: 2^(25 − 1.6) ≈ 10,000,000× more states than the rule can see

The takeaway: A rule system with roughly 10 conditions encodes on the order of 10 bits of control capacity. The demand system—with cross-domain interactions across weather, sales, ad, inventory, and structural signals—requires on the order of 10⁷ effective parameters (~25 bits of state space). You cannot close this gap by writing more rules. You need a model with sufficient internal complexity to match the system.

Capacity of “if temp > 90, boost sunscreen”: one threshold + one action + one channel = ~log₂(3) ≈ 1.6 bits. A rule encodes roughly 1.6 bits of information. The demand system — with cross-domain interactions across weather, sales, ad, inventory, and structural signals — requires on the order of 10⁷ effective parameters to represent faithfully. This is not a judgment call — it is an information-theoretic mismatch. Ashby’s law guarantees that the uncontrolled variety (the gap between 1.6 bits and ~25 bits) is irreducible. Each bit of information acquired about the system can decrease the entropy of its future state by at most one bit. A 1.6-bit controller can therefore reduce demand uncertainty by at most 1.6 bits — out of the roughly 25 bits (10⁷ states) of the real system.

Rule-based controller3 decisions per weather eventIf temp > 90°F → boost sunscreenIf UV > 8 → boost outdoorElse → holdEverything else: uncontrolled variety(untapped demand, unrealized potential)vsLarge Weather Demand Model10⁷ learned weights across the full tensorweather × geo × category × channel × timeresponse weightsAshby's Law: only variety can absorb variety. The controller must match the system.
Three Analogies: Thermostat, Autopilot, Chess EngineIntuition

The thermostat. The thermostat appears to “work” for the lobby—but only if you ignore that it is simultaneously overheating the server room, freezing the corner offices, and wasting energy fighting its own thermal coupling effects. At the system level, it is not partially succeeding. It is failing in a way that is invisible from the lobby. Rules have the same property: “boost sunscreen when hot” looks correct from the sunscreen category’s perspective, while destroying margin across every coupled category and market.

The autopilot. Cruise control maintains a fixed speed regardless of context—into curves, through school zones, toward stopped traffic. It is not solving a simpler version of the driving problem. It is ignoring the driving problem and adjusting one variable. Rules do the same: they adjust spend based on one variable while ignoring the hundreds of variables that determine whether that adjustment helps or harms.

The chess engine. A rule-based system facing a 25+ bit demand surface is not a strong amateur playing chess. It is a player who can only see 3 squares on a board with millions. The gap between rules and a causal model is not the gap between amateur and grandmaster—it is the gap between guessing and computing. Most teams are guessing with 3 rules and calling the result a strategy.

Why Automation Alone Doesn’t Help

Automating rules just makes them fail faster. Two hundred rules is still ~320 bits against a system that needs tens of millions of parameters. Speed without variety is noise at scale. The deeper problem: rules encode correlations, not causes. “High UV correlates with sunscreen sales” is true, but useless without the marginal effect, the interaction structure, the temporal dynamics, and the cross-market spillovers.

WeatherVane does this

The LWDM maintains ~10⁷ learned weights across the full weather × sales × ad × inventory × market × category × channel × time tensor. Every recommendation draws on the complete demand surface, not a threshold. The model outputs not just “increase spend” but the expected marginal return on the next dollar, the confidence interval, and the break-conditions under which the recommendation would change.

Full information-theoretic methodology →

3.2.1 Ashby’s Law of Requisite Variety

In 1956, W. Ross Ashby proved a theorem that should be on the wall of every marketing ops team: only variety can absorb variety. A controller that governs a complex system must possess at least as much internal complexity as the system it’s trying to control. This isn’t a suggestion—it’s a mathematical law, as fundamental to control theory as conservation of energy is to physics.

The intuition is simple: if a system can be in 1,000 states and your controller can only distinguish 3 of them, the remaining 997 states are uncontrolled. You aren’t making bad decisions in those states—you aren’t making decisions at all. Your controller literally cannot see the difference between a state where you should spend $50K and one where you should spend $0.

This law shows up everywhere real control matters:

🏢
Thermostat vs. Building
Controller
Single thermostat
~3 bits
System
50 zones × 4 seasons × occupancy
~15 bits
Server room overheats. Corner offices freeze. Energy bill spikes. Lobby happens to be 72°F by coincidence of placement, not control.
Fix: Modern BMS: per-zone sensors, weather forecast, occupancy schedule.
🚗
Cruise Control vs. Highway
Controller
Hold 65 mph
~1 bit
System
Curvature, traffic, weather, hills, pedestrians
~20+ bits
Maintains speed into a curve. Rear-ends traffic.
Fix: Autonomous driving: LIDAR + radar + cameras + maps = matched variety.
📊
Rule Playbook vs. Demand
Controller
6 weather rules
~1.6 bits
System
270M parameters across weather, sales, ad, inventory, structure
~25 bits
Boosts sunscreen in Phoenix at 108°F. Demand already inverted.
Fix: LWDM: 10⁷ learned weights matching the full cross-domain demand surface.

The math is clean: each rule encodes roughly log₂(3) ≈ 1.6 bits (one threshold, one action, one channel). Ten rules give you ~16 bits. The demand system needs ~10⁷ effective parameters (~25 bits of state space). The gap between 16 bits and 25 bits is exponential: 2⁹ = 512x more states than your controller can distinguish. Ashby’s law guarantees that this uncontrolled variety is irreducible—you cannot close it by running rules faster, only by building a controller with more internal structure.

This is why the LWDM exists. Not because rules are “too simple” as a matter of taste, but because Ashby proved in 1956 that they are mathematically incapable of controlling a system this complex. The controller must match the system. The LWDM does.

THE EVIDENCE — INFORMATION COMPOUNDS

3.3 Multi-Signal Information Gain

Each additional signal dimension doesn’t just add information—it multiplies it through interaction effects. Temperature alone tells you demand direction. Temperature × humidity reveals comfort thresholds. Add a local event and you see compound surges. Add economic indicators and you separate weather-driven demand from income-driven demand. The information gain is superlinear because signals interact: the whole exceeds the sum of parts.

Toggle signal groups on and off. Expand a group to control individual signals. Watch the curve explode as cross-domain interactions compound.Try turning on Weather + Sales + Ad groups together. The jump from 15 to 25 signals triggers hundreds of new interaction terms—exactly the dynamics a rule-based system cannot capture.

Rules baseline (dashed red line): each signal gets at most 3 threshold bins (e.g., “low / medium / high temperature”), applied independently, with cognitive decay—the 10th rule a human writes is less precise than the 1st. This is generous: most real rule sets use fewer bins and ignore most signals entirely.

Weather Signals2/99 atmospheric variables
Temporal Dynamics0/1Lags, derivatives, EMAs — multiplies every signal
Sales & Demand0/6Revenue, conversion, inventory signals
Ad & Marketing0/8Spend, impressions, ROAS, creative signals
Structure & Dynamics0/7Geo, category, channel, adstock, saturation
2 / 31 signals active4 effective dimensions
0113225338450Rules: 5.503691215182124273031Number of signalsInformation capacity (bits)3.31st order
LWDM capacity
3.3
Rules
2.8
Active interactions (1 pairwise)
Temp × UV (4)

3.4 Live Simulator: Weather → Demand Response

The simulator below exposes the LWDM’s full 6-dimensional weather input space: temperature, UV, humidity, wind speed, precipitation, and consecutive-day pattern duration. Seven product categories respond with demand functions that are computationally irreducible—no finite set of rules can replicate them. Cross-category cannibalization, temporal lag effects, and higher-order derivatives (jerk, snap) create behaviors that defy intuition. The anomaly detector highlights moments where the model’s prediction contradicts what a simple rule would suggest.

Click "Rain + Sunscreen Spike" then switch pattern to "Thermal Whiplash."Sunscreen demand stays elevated despite rain (tropical UV diffusion). Switch to Whiplash — 3rd/4th derivative effects ripple through all categories. The complexity gauge spikes. Notice how Allergy/Pharma surges from barometric jerk alone.
SunCo SPF case study
Try:
PATTERN:
Figure 2. Live 6-dimensional demand response surface. 7 product categories, cross-category cannibalization, temporal lag effects, and higher-order weather derivatives (d³/d&sup4;) interact to produce computationally irreducible demand predictions. Try the scenario presets to see counterintuitive behaviors: rain boosting sunscreen, outerwear spiking at 75°F, barometric jerk triggering allergy demand.
Fat Tails: The Value of Extreme EventsDeep dive

Weather shocks follow fat-tailed distributions. A 3-sigma event in a normal distribution happens 0.3% of the time. In fat-tailed weather distributions, it happens much more often — and the demand effect can be 10× or 50× the average. Portland hitting 116°F in June 2021 created a sunscreen demand spike exceeding the entire previous summer.

The LWDM’s weather shock detection uses distributional thresholds calibrated to fat-tailed distributions, not Gaussian assumptions. It detects opportunities that rules, trained on the body of the distribution, will miss by construction.

3.5 Phase Transitions and Critical Thresholds

At 33°F, the parking lot is wet. At 31°F, the parking lot is wet. At 32°F, everything changes. One degree, and the physics reorganize completely. Above that line, precipitation is rain and outdoor activity is possible. Below it, precipitation is snow, roads close, and an entirely different set of products becomes relevant—de-icers, winter coats, hot beverages.

Demand does this too. The response surface does not gently slope through these critical thresholds. It jumps. Physicists call these phase transitions.

These phase transitions are ubiquitous in weather-demand systems:

  • UV index 7 → 8: the threshold where dermatologists recommend sunscreen. Consumer awareness creates a step-function in demand that a linear model will underestimate at 8 and overestimate at 7.
  • Temperature × humidity: the “heat index” threshold where outdoor activity drops sharply. This is not a temperature effect or a humidity effect—it is an interaction that only exists above both thresholds simultaneously.
  • Precipitation probability 40% → 60%: the tipping point where consumers cancel outdoor plans. The first 40% of rain probability is almost free; the next 20% destroys outdoor-adjacent demand.
  • First frost of the season: an annual phase transition that triggers heating equipment purchases, winter wardrobe demand, and home comfort categories simultaneously across a geographic band.

In complexity science, systems near phase transitions exhibitcritical slowing down: they become slow to recover from perturbations, and small inputs produce large, nonlinear outputs. These are exactly the moments where marketing dollars are most productive—and where rules, which model demand as linear, miss the opportunity entirely. The LWDM uses GAMs and threshold detection specifically to capture these discontinuities.

The same weather event triggers divergent—often opposite—demand responses across product categories. A 10°F temperature increase sends sunscreen demand soaring while collapsing outerwear sales. A single rule (“if hot, boost spend”) is worse than useless: it allocates budget toward categories already surging organically while starving categories where the marginal dollar is most productive. Drag the temperature slider below to see how five categories respond simultaneously to the same thermal shift.

Figure. Category Response Wheel. Drag the temperature to see demand elasticities diverge: sunscreen peaks at 95°F while outerwear peaks at 25°F. A single-variable rule cannot optimize across these opposing response curves simultaneously.
32°F (freeze)75°F (onset)90°F (rule fires)Revenue the rule missesMild: low signalHot: steep responseExtreme: saturatesActual demand response (nonlinear, with phase transitions)Rule: if temp > 90, boostDemand intensityTemperature (°F)SunCo SPF — case study
CAUSAL IDENTIFICATION

In 2012, Busse et al. noticed something odd: people buy more convertibles on sunny days—even though they won’t take delivery for weeks. The weather didn’t change the car’s value. It changed the buyer’s feelings. That is the difference between correlation and cause. Every weather-demand model has to decide which one it is measuring. The LWDM decides by exploiting weather shocks—sudden, localized, non-manipulable weather events—as natural experiments. The weather randomizes itself.

Correlation is not a business strategy. Every dollar allocated on a correlation is a dollar that might be wasted.

KEY NOTATION
D(g,t,c) = demand · W(g,t) = weather state · S(c,t) = ad spend · Z(g,t) = controls · θ = adstock decay · λi = shrinkage weight

3.6 The Confounding Structure

A sunscreen brand sees sales rise on hot days and assumes temperature drives demand. They are wrong. What is actually happening is more interesting—and more exploitable.

Demand for any product-market-time combination is simultaneously influenced by a web of confounding variables, most of which are correlated with both weather and with each other:

  • Ad spend (endogenous). Smart advertisers already increase spend when they expect high demand. This createsreverse causality: a naive regression of demand on weather + spend produces biased estimates because spend is a collider—it is simultaneously caused by expected demand and causing observed demand.
  • Promotions and discounts. A brand may run a “summer sale” timed to hot weather. If we attribute the resulting demand lift to weather, we overestimate the weather effect. If we attribute it to the promotion, we underestimate it. The interaction must be modeled, not ignored.
  • Campaign creative and targeting. A sunscreen ad featuring a beach scene converts differently in Phoenix (where the audience lives near deserts, not beaches) than in Miami. The creative × geography × weather interaction is a confound that rules cannot represent.
  • Inventory and stockouts. Observed sales are censored by inventory. If a product sells out, demand appears to plateau—but the true demand surface continues rising. Failing to correct for stockouts biases the estimated weather elasticity downward.
  • Competitor actions. When all sunscreen brands boost ads during a heatwave, CPMs (cost per thousand impressions) rise and ROAS drops. The market-level equilibrium depends on competitor behavior, which is unobserved but correlated with weather.
  • Platform algorithm changes. Google and Meta continuously update their bidding algorithms, audience models, and auction mechanics. These changes affect ad effectiveness independent of weather and must be absorbed by the model.
  • Your own policy (reflexivity). Once a system starts reallocating spend based on weather, it changes the auction environment. If every sunscreen brand has a “if hot, boost spend” rule, hot days have inflated CPMs and compressed ROAS. The data you train on is not passively sampled from nature—it is generated by your historical policy and the platform’s policy, and those policies adapt. A static rule is especially vulnerable because it is predictable and therefore exploitable.
  • Seasonality. The most dangerous confounder because it is correlated with everything: weather, consumer behavior, ad spend, promotions, and inventory. A model that does not rigorously separate seasonal patterns from weather effects will attribute Christmas demand to cold weather.
Why “Do Nothing” Is Sometimes the Highest-ROI ActionAuction dynamics

When a weather opportunity is already fully priced into the auction—all competitors bid up the same sunny-day impressions—the marginal dollar of spend can have negative ROI. A model-based system can detect when the crowd has already priced in the weather signal and recommend restraint.

Rules almost never say “do nothing” with confidence, because that is not what rules are for. A system that includes inaction in its action space is strictly more capable than one that does not. The weather-conditioned causal model estimates the marginal return of the next dollar, net of auction equilibrium effects. When that marginal return is below threshold, the optimal action is to hold—and to holdconfidently, with a quantified opportunity cost of acting.

The directed acyclic graph (DAG) below shows the causal structure. The key insight is that weather is the only purely exogenous variable in the system. It cannot be manipulated by advertisers, competitors, or platform algorithms. This is what makes it useful for identification—not because weather matters more than other factors, but because it is the one factor whose causal direction is unambiguous.

Weather(exogenous)WDemand(outcome)DAd Spend(endogenous)SSeasonalityZ1PromotionsZ2InventoryZ3CompetitorsZ4Creative& targetingZ5PlatformalgorithmsZ6Causal path (identified)Instrument (weather exogeneity)Confounding path (must be controlled)
── Green arrows: causal effects we estimate. ── Orange arrows: confounders we control for. - - Blue dashed: weathers exogeneity (our secret weapon). Key insight: Weather is the only variable with arrows going OUT but none coming IN from marketing.
Read the DAG: arrows show causal direction. Weather has no incoming arrows from Ad Spend or Demand — it is exogenous by construction.

The green causal paths (W → D, S → D) are what we want to estimate. The orange confounding paths are what we must control for. The dashed blue path (W → S) represents weather’s role as an instrument: because weather is exogenous, any correlation between weather and ad spend is either direct (reactive advertisers adjusting to weather) or spurious (via seasonality). By conditioning on seasonality and using weather shocks (deviations from seasonal norms), we isolate the causal effect.

A critical nuance: weather as demand shifter, not classical instrument. A traditional instrumental variable for ad spend must satisfy the exclusion restriction: it affects demand only through ad spend. Weather violates this—Dell, Jones, & Olken (2014) document that weather affects economic outcomes through multiple channels (consumer mood, physical need, transportation patterns, energy costs), and Busse et al. (2015) show that weather directly alters purchase decisions independent of advertising. Mellon (2025) catalogs 194 potential exclusion-restriction violations for weather instruments across the social sciences. The LWDM addresses this head-on by treating weather as a demand shifter that enters the model through two explicit pathways: (1) a direct demand effect (γ in the demand function), capturing weather’s influence on consumer need and behavior, and (2) an interaction effect (weather × ad spend), capturing how weather moderates advertising effectiveness. By modeling the direct channel explicitly rather than assuming it away, we preserve weather’s instrumental value for the residual variation that identifies ad effectiveness—the portion of weather-induced demand variation not explained by the direct channel.

ADDRESSING EXCLUSION RESTRICTION CONCERNS

Mellon (2025) catalogs 194 potential pathways through which weather may violate the exclusion restriction. For advertising demand specifically, the most relevant channels are:

  1. Consumer mood and affect: weather influences purchase intent independently of ad exposure (Busse et al., 2015).
  2. Physical need: temperature directly creates demand for comfort goods (sunscreen, beverages, outerwear).
  3. Transportation and mobility: weather affects foot traffic and store visits, altering the channel mix.
  4. Seasonal confounding: weather correlates with holidays, school schedules, and seasonal promotions.
  5. Energy costs: extreme weather shifts household budgets away from discretionary spending.

The LWDM addresses these through: (a) controlling for weather-correlated confounders (outdoor activity proxies, mood indices, seasonal patterns) as covariates in Z; (b) using high-frequency variation (daily shocks within a month, removing seasonal confounds); (c) bounding residual bias using the Conley, Hansen & Rossi (2012) plausible exogeneity framework, which allows the exclusion restriction violation to be a nondegenerate random variable and estimates bounded inference even under partial violations—their applied examples show that “inference is informative even with a substantial relaxation of the exclusion restriction.” We report these bounds alongside point estimates so users can assess sensitivity to remaining exclusion restriction concerns.

The weather signal is important not because it is the largest driver of demand—it often is not—but because it is the only driver whose causal direction is guaranteed. That guarantee lets us identify effects that are otherwise hopelessly confounded.

This is why the LWDM is not “weather-based demand prediction.” It is a causal model that estimates demand response net of all observed confounders, using weather’s exogeneity as the identification lever.

Econometrics tells you what to estimate (the causal effect of weather on demand, net of confounding). Machine learning handles the messy prediction work (modeling all the confounders that stand in the way). Neither alone is enough.

Naive: “Use neural networks for everything—surely they will find the causal signal.”
Actual: Neural networks are extraordinary at prediction and terrible at causal inference. You need both tools, each doing what it does best.

The trick is to split the data in half: use one half to estimate the nuisance (everything that is not the causal effect you care about), the other to estimate the causal effect itself. This prevents the model from overfitting its own noise—a technique called cross-fitting(Chernozhukov et al., 2018) PEER-REVIEWED. The result: you can use neural networks or gradient-boosted trees for the messy prediction work while keeping the causal parameter—the weather-modulated advertising elasticity—at inferential quality with valid confidence intervals.

Causal Forests (Wager & Athey, 2018) PEER-REVIEWED extend this to discover that a weather shock’s impact differs across markets and categories without imposing parametric assumptions. Deep IV (Hartford et al., 2017) relaxes linearity: a 5°C temperature increase has a very different effect on sunscreen ad spend than a 15°C increase, and the model captures this.

3.7 Weather Shocks as Natural Experiments

A weather shock is a deviation from the expected weather pattern in a geographic market: an unexpected heatwave in the Pacific Northwest, a freak cold snap in the Southeast, a UV spike after weeks of cloud cover. These events are exogenous—they are not caused by advertising decisions, consumer behavior, or competitive dynamics—which makes them valid instruments for causal identification. Dell, Jones, & Olken (2014) provide the definitive survey of this strategy across economics: weather variation has been used as an instrument for causal estimation in at least 83 papers in top-5 economics journals N=83 PAPERS and over 300 in well-ranked field journals (Mellon, 2024), spanning agriculture, labor supply, conflict, health, and consumer behavior. The lineage extends to the very origin of the method: Wright (1928) used weather as the first instrumental variable in econometric history, estimating demand elasticities for agricultural commodities via regional rainfall. The LWDM applies this same identification logic to advertising allocation.

1928year Philip Wright invented instrumental variables — using weatherWright (1928), The Tariff on Animal and Vegetable Oils, Appendix B

Philip Wright solved this problem in 1928 using rainfall and butter prices. We are solving it again, with better weather data and worse attention spans.

83top-5 economics papers using weather as a causal instrumentDell, Jones & Olken (2014), J. Econ. Lit.

The core insight: a weather shock is a deviation from expected weather in a specific market—an unexpected heatwave, a freak cold snap, a UV spike after weeks of cloud. These events randomize themselves, providing exogenous variation that separates weather-driven demand from ad-driven demand. The model estimates D(g,t,c) = f(W, S, Z) + ε—demand as a function of weather, spend, and controls, with weather shocks as the identification lever.

Full demand response specification →

3.8 Synthetic Control Estimation

Abadie originally developed synthetic control methods to study the economic impact of terrorism in the Basque Country (2003). The method constructs a “synthetic” version of the treated unit from weighted combinations of untreated units, enabling causal inference from observational panel data without randomization.

When a weather shock occurs in market g, we construct a synthetic control from the donor pool of unaffected markets. Following Abadie, Diamond, and Hainmueller (2010), we find weights w1, ..., wJ that minimize the pre-shock prediction error of the treated market, then estimate the treatment effect as the post-shock divergence:

Before/after: “Sales rose 20% after the heatwave.” Was it the heatwave, a promotion, or a stockout?
Synthetic control: Sales rose 20% vs. matched markets without the heatwave. Weather-attributable lift: 14%.

This approach is nonparametric (no functional form required) and uses exact permutation-based inference. Weather shocks create natural staggered treatment timing, which we handle via the Callaway & Sant’Anna (2021) estimator, which decomposes treatment effects into clean 2×2 comparisons between newly-treated and not-yet-treated cohorts, avoiding the contamination that plagues conventional difference-in-differences with variation in treatment timing.

WEATHER SHOCKPRE-SHOCK (matched)POST-SHOCK (diverged)Treated market (shocked)Synthetic control (unshocked donor blend) Treatment effect (\u03C4)τDemand (indexed)Time (days)

Full identification methodology: synthetic control specification, staggered treatment timing, DML & Causal Forests →

3.9 Identifiability Testing

Before any model touches a dollar of ad spend, we run a battery of simulation-based identifiability tests. We generate synthetic data under known parameters, fit the model, and check whether the estimated parameters recover the ground truth. This catches models that are “fitting noise”—finding apparent effects where none exist.

  • Parameter recovery: Generate data with known elasticities, fit the model, check |θ̂ − θ| / θ.
  • False positive control: Generate data with zero treatment effect, verify the model does not detect a spurious effect (type I error rate ≤ 0.05).
  • Power analysis: For each market-category pair, compute the minimum detectable effect size given the available data.
  • Regime testing: Run identifiability under multiple simulation regimes (high noise, correlated weather-spend, missing data, concept drift) to stress-test robustness.

3.10 Empirical Validation

All results below are from synthetic backtests using generated demand data with known ground-truth weather elasticities. Production validation with real tenant data is in progress with our initial design partners. We report these pre-launch benchmarks to demonstrate methodological rigor and set expectations; they should not be interpreted as production performance claims. These are synthetic backtests—the scientific equivalent of rehearsal. The live show starts with our design partners in Q2 2026.

SYNTHETIC BACKTEST RESULTS (PRE-LAUNCH) SYNTHETIC BACKTEST
MetricIn-SampleHeld-OutNotes
Parameter recovery (MAE)<8%<15%Mean abs. error on weather elasticity coefficients vs. ground truth
Prediction accuracy (MAPE)6.2%11.4%Mean abs. percentage error on 7-day demand forecasts
Causal estimate R²0.830.71Variance explained in weather-attributable demand component
Conformal coverage (90%)91.2%88.7%Fraction of true outcomes within 90% prediction intervals
Partial pooling variance reduction40–60%vs. per-tenant OLS for tenants with <90 days of data
Allocation lift vs. naive+18%+12%Incremental revenue from LWDM allocation vs. uniform spend

Diagnostics reported per tenant: For each tenant onboarded, the LWDM reports a first-stage F-statistic (target >10, following Stock & Yogo, 2005) to confirm instrument strength, a pre-trend balance test (placebo check on pre-shock periods) to validate the parallel trends assumption, and Rosenbaum sensitivity bounds for unmeasured confounding. Tenants with weak instruments (<10 F-stat) receive wider confidence intervals and a warning flag on their recommendations.

Hyperparameter selection: Adstock decay (θ), Hill shape (α), and shrinkage bandwidth are selected via time-series cross-validation (expanding window) with MAPE as the selection criterion. Shape and decay parameters are optimized via Bayesian search (Optuna). v1.0 training completes in approximately 15 minutes for 1,000 markets × 50 categories on a single 8-core CPU. Inference latency is <100ms per market recommendation.

Caveat: These results are from synthetic data where the data-generating process matches the model’s assumptions. Real-world performance will differ due to model misspecification, unobserved confounders, and non-stationarity. We are committed to publishing production validation results as they become available from our design partner engagements. Gordon et al. (2019) document significant discrepancies between observational and experimental ad measurement; we recommend periodic validation against geo-holdout experiments and provide built-in holdout group management for this purpose.

4. The Network

James and Stein proved something in 1961 that startled the statistics community: estimating three or more quantities separately is always worse than estimating them together. Always.

4.1 Hierarchical Partial Pooling

Most advertisers on the platform have limited history: perhaps 6–12 months of daily data across a handful of markets. Estimating market-category-specific demand elasticities from this alone is noisy. We address this with empirical Bayes shrinkage: each tenant’s parameter estimates are shrunk toward the network-wide posterior, with the degree of shrinkage determined by the tenant’s data quality.

Specification — Partial Pooling (Empirical Bayes)
Per-tenant estimate:
  θ̂ᵢ = λᵢ · θ̄ + (1 - λᵢ) · θ̂ᵢ_OLS

Shrinkage factor:
  λᵢ = σ²ᵢ / (σ²ᵢ + τ²)

where:
  θ̂ᵢ_OLS   = tenant i's OLS estimate (noisy, high variance for small tenants)
  θ̄         = network-wide mean (pooled across all tenants)
  σ²ᵢ       = sampling variance of tenant i's estimate
  τ²        = between-tenant variance (estimated from the network)

When σ²ᵢ is large (noisy data), λᵢ → 1 and the estimate shrinks toward θ̄.
When σ²ᵢ is small (precise data), λᵢ → 0 and the tenant keeps its own estimate.

This is equivalent to a random-effects model:
  θᵢ ~ N(θ̄, τ²)     (prior)
  θ̂ᵢ ~ N(θᵢ, σ²ᵢ)   (likelihood)
40-60%variance reduction for new tenants via hierarchical partial poolingJames & Stein (1961); Efron & Morris (1975)
Naive: Estimate each brand separately — more data per brand means better estimates
Actual: Pooled estimation always dominates. Estimating together beats estimating separately — even for unrelated quantities (Stein, 1961).
The “Stein paradox”: estimating three or more unrelated quantities together always beats estimating them separately. This was so counterintuitive that it sparked a decade of debate in statistics. Efron & Morris demonstrated it using baseball batting averages.

4.1.1 Interactive: Network Effects Explorer

Drag the slider to add tenants to the network. Watch how the network prior (green line) tightens as more data flows in, and how small-tenant estimates (small circles) converge toward the pooled mean. This is the compounding network effect: every new tenant makes the model better for everyone.

Drag the slider from 2 tenants to 50. Watch the small-data tenants.The green curve (prior) tightens dramatically. Small tenants (small circles) get pulled toward the network mean — free precision from the network.
Tenants on network:8(drag to see network effects)
Figure 4. Hierarchical partial pooling in action. Drag the slider to add tenants (2–50). As the network grows, the prior (green curve) tightens, and small-tenant estimates (small circles) are pulled toward the network mean. This is the compounding data flywheel: each new tenant improves estimates for every existing tenant.

4.2 Network Effects Through Partial Pooling

The partial pooling architecture (Section 5) creates a compounding network effect. Every new tenant on the platform improves the model for every existing tenant. This works because empirical Bayes shrinkage uses the cross-tenant distribution to estimate the prior. More tenants → better prior → lower variance estimates for everyone, especially small advertisers with limited data.

This is not a marginal improvement. A tenant with 90 days of data and 5 markets has wide confidence intervals—unreliable recommendations. Shrink toward a network-wide prior estimated from 500+ tenants and that variance collapses. The small tenant gets the statistical power of the entire network.

The large tenant contributes more to the prior than it receives, but still benefits: the model learns that sunscreen and outdoor furniture respond to UV similarly, even if a given tenant only sells one of those categories. Cross-category knowledge transfer is free.

This is the mechanism through which the LWDM becomes a foundation model for demand, not just a per-tenant regression. The model knows things about demand that no single tenant can observe alone.

Every new tenant makes every existing tenant’s model better. This is not a feature — it is a structural property of hierarchical Bayesian estimation.

Deep dives: Ergodicity, Viable System Model mapping, James-Stein formal proof →

Network effects in practice

A new tenant with 90 days of data and 5 markets benefits from empirical Bayes shrinkage toward the network prior. In synthetic benchmarks, this reduces the weather elasticity confidence interval from ±0.35 to ±0.12—a 65% variance reduction that enables actionable recommendations where rules-based systems would flag insufficient data. Large tenants contribute more to the prior than they receive, but still benefit from cross-category knowledge transfer. See Section 4 for the math.

Want to see this in action?

Request Early Access →

5. The Architecture

The LWDM is not a single model. It is a stack of complementary models, each contributing a different capability—running on a single node, no Spark, no warehouse, no distributed compute overhead.

5.1 Data Pipeline

IngestWeather · Ads · ShopifyTransformPolars · Feature Eng.StoreDuckDB · ParquetModelLWDM StackOptimizecvxpy · ConstraintsDeliverPlan & ProofEnd-to-end data pipeline · Single node · No Spark · No warehouse

Ingestion

Ad platform APIs (Google, Meta), Shopify, weather APIs. Columnar-native from the start: Polars for transforms, Parquet for storage.

Feature Store

Weather feature builder: rolling windows, delta computations, interaction terms. DuckDB for analytical queries. 100+ weather & calendar features per market. v2.0 adds upstream embeddings from weather foundation models (GraphCast, GenCast, NeuralGCM) as high-dimensional inputs encoding spatial gradients and multi-variable interactions that scalar covariates cannot capture.

Orchestration

Prefect DAGs. Daily ingestion, weekly model retraining, continuous forecast monitoring. Postgres control plane.

Serving

FastAPI + Python 3.11. Plan & Proof mode: read-only by default. Every recommendation includes backtest, uncertainty bounds, and break conditions.

5.1.1 Weather Foundation Model Integration

Since 2022, ML-based weather models have surpassed traditional NWP—GraphCast produces 10-day forecasts in under a minute, GenCast outperforms ECMWF on 97.2% of targets. WeatherVane ingests these ensemble forecasts daily. Each morning, we recompute the demand surface and deliver updated recommendations before markets open. Weather is no longer a noisy covariate. It is a high-fidelity planning signal.

Full technical details: all 6 weather foundation models →

2022FourCastNetFirst ML ≥ NWP2023GraphCastPangu-WeatherClimaXDeterministic parity2024GenCastNeuralGCMFuXi-S2SProbabilistic × hybrid physics2025AuroraAardvarkStormerFoundation modelsAI weather forecasting: from physics simulations (hours on supercomputers) to foundation models (minutes on a single GPU)

5.2 Model Stack (Estimation Layer)

WeatherVane is not a single model. It is a stack of complementary models, each contributing a different capability:

  • v1.0Weather-aware Media Mix Models (Ridge regression with adstock and saturation transforms). The estimation layer workhorse. Estimates channel-level marginal returns with weather covariates.
  • v1.0GAMs via pyGAM for nonlinear weather response curves. Captures the diminishing marginal effect of UV index on sunscreen demand, the threshold effect of precipitation on outdoor activity, the interaction between temperature and humidity.
  • v1.0Bayesian MMM (optional, per-tenant). Full posterior inference over adstock decay rates, saturation parameters, and weather elasticities. Used when the tenant has enough history to support MCMC convergence.
  • v1.0Hierarchical partial pooling (empirical Bayes shrinkage). Shares statistical strength across the tenant network. Small advertisers with sparse data benefit from the full network’s learned parameters.
  • v1.0Convex optimization (cvxpy) for budget allocation. Weather-aware constraints, saturation fairness, rollback safety.

The model learns which weather variables matter for each tenant and category—automatically. v2.0 replaces the linear core with a transformer-based architecture (Section 7) where self-attention discovers cross-category interactions and cross-attention learns nonlinear weather–demand couplings that ridge regression cannot represent.

5model families in the v1.0 estimation layer—ridge regression, GAMs, Bayesian MMM, gradient boosting, neural networks—each contributing a different capability
v2.0 Architecture: Deep Learning & Foundation ModelPlanned

v2.0The v2.0 architecture uses multi-head attention over known future inputs (weather forecasts, calendar events) so that a sunscreen brand in Phoenix and a hot cocoa brand in Minneapolis share the same architecture but attend to completely different signals. A single model trains across all tenants simultaneously. Per-tenant embeddings capture idiosyncratic behavior; shared parameters encode universal weather-demand dynamics. This approach draws on Temporal Fusion Transformers (Lim et al., 2021) for variable selection and DeepAR (Salinas et al., 2020) for global training.

Uncertainty quantification uses adaptive conformal inference (Gibbs & Candès, 2021). The coverage guarantee holds distribution-free—no stationarity assumption required. When weather regimes shift, the prediction intervals widen automatically.

3.3 Network Architecture v2.0

The diagram below shows the target v2.0 information flow: from raw inputs (weather variables, sales, spend, inventory, geography) through multiple learned representation layers to actionable outputs (ad efficiency estimates, revenue lift predictions, inventory guidance, confidence scores, and break conditions). v1.0 uses a simpler Ridge + GAM pipeline (see 3.3 above).

How to read this: Left column = raw signal inputs. Middle layers = learned representations that combine weather, marketing, and inventory signals. Right column = actionable outputs. Line thickness indicates connection strength.

SignalsShock detectionEffect estimationPartial poolingOptimizationActionsTTemperatureUVUV indexPPrecipitationWWind / PressureΔWeather deltasSSales historyAAd & mktg spendIInventorySnSeasonalitytTime signalsGGeographyEAd efficiencyRRevenue liftInvInventory guidanceCConfidence scoreBBreak conditions
Thousands of learned weights · 1,000+ geos · 100+ variables · Weekly recommendations · Daily alerts
INPUTS: Weather state (temp, UV, humidity, wind, pressure, cloud, deltas, interactions), sales history, ad spend per channel, inventory, seasonality/calendar, geography
OUTPUTS: Per-channel ad efficiency, revenue lift estimates, inventory guidance (stock risk), model confidence scores, break conditions (when to stop)

5.3 Model Stack: How It Actually Works

The neural network above shows the information flow at inference time. Here is the actual model stack showing which algorithms run at each stage and how they connect:

1INPUTSWeather (15d) + Ad Spend (5ch) + Sales + Calendar + InventoryPolars DataFrames2ESTIMATION LAYERRegularized estimation; causal identification from upstream weather shocksRidge + GAM + IV regression3DEMAND SURFACENonlinear weather×spend response with interaction effectsBayesian MMM + Hill saturation + Adstock4NETWORK LAYERPartial pooling across tenant network → shrinkage priorEmpirical Bayes hierarchical model5OPTIMIZERConstrained budget allocation with uncertaintycvxpy convex optimization + MPC6OUTPUTRecommendations + evidence bundles + break-conditionsREST API + Python SDKMPC feedback loop

5.4 Closed-Loop Control: Model Predictive Control

There is a subtlety that even the best rule-based systems miss: the demand system is not an open-loop process. It is a closed-loop feedback system. When the model changes a spend allocation, that change alters demand. Altered demand changes the model’s next forecast. The next forecast changes the next allocation. And so on.

In control theory, this is the difference between open-loop control (set a plan and execute it blindly) and closed-loop control (continuously observe the state and adjust). Conant & Ashby’s Good Regulator Theorem (1970) THEOREM proves that every good regulator of a system must be, or contain, a model of that system—a result that rules violate by construction, since they encode no internal model at all.

Conant & Ashby (1970) proved this theorem for arbitrary dynamical systems. The proof is information-theoretic: a regulator that lacks a model of the system’s state space cannot generate control signals with sufficient variety to track the system. It is the control-theory equivalent of the No Free Lunch theorem.

A rule like “boost sunscreen when UV > 8” is open-loop: it fires regardless of what happened after the last boost. The LWDM is closed-loop: it observes the effect of its own recommendations on demand, updates its beliefs, and re-optimizes. This is formally equivalent toModel Predictive Control (MPC)—the same framework used in process engineering, autonomous vehicles, and robotics:

  • Forecast: Predict demand over the planning horizon (7 days) using the current LWDM parameters and weather forecast.
  • Optimize: Solve for the optimal allocation subject to budget constraints, saturation limits, and rollback safety.
  • Execute: Deliver recommendations (or push to ad platforms in auto mode).
  • Observe: Ingest actual demand data. Detect divergence between predicted and observed outcomes.
  • Re-plan: Update the forecast, re-solve the optimization, and alert the user only if the recommended action changes materially.

Every good regulator of a system must be a model of that system. Rules contain no model. That is why they are not regulators — they are guesses.

Open-loop: “Boost sunscreen +20% when UV > 8.” Fires regardless of last week’s results.
Closed-loop MPC: Observes last week’s +4% lift (below forecast). Diagnoses saturation. Redirects spend.

Outcome-driven, not spend-driven. WeatherVane does not control ad spend directly. Spend is a lever, not a goal. What the optimizer actually targets is the business outcome: ROAS, incremental revenue, demand lift. If a weather shock reduces baseline demand, the system does not blindly increase spend to hit a budget target. It diagnoses whether additional spend can close the gap—or whether the gap is structural (e.g., a severe storm where no amount of advertising will drive foot traffic). This makes the system self-correcting at the level of business objectives.

Rules cannot implement this closed-loop approach because they have no internal model of the system. They do not predict future states, do not account for the consequences of their own actions, and do not learn from outcomes. They fire, forget, and fire again. In v3.0–v4.0, the MPC layer extends to RL-based budget optimization via contextual bandits, where the weather forecast embedding serves as the context vector and the system learns optimal weather-contingent allocations from historical observational data—critical because retailers cannot A/B test weather itself.

Full architecture specifications: neural network diagram, MPC formalism, Perceptual Control Theory →

WeatherVane’s MPC loop

Every Plan & Proof cycle is one iteration of MPC: weather forecast arrives → model updates the demand surface → optimizer re-solves allocation subject to budget constraints → recommendations ship with evidence bundle, break-conditions, and confidence intervals → actual outcomes are ingested → model updates beliefs → next cycle.

Theoretical motivation: Active InferenceTheory

This loop is structurally analogous to active inference under the free energy principle (Friston, 2010): the system minimizes prediction error between expected and observed demand outcomes, rather than maximizing a static reward function. Friston’s framework generalizes Conant & Ashby’s Good Regulator Theorem. We draw on this analogy because it motivates the Bayesian architecture (partial pooling, uncertainty quantification) rather than a frequentist point-estimate approach. The analogy is structural rather than a claim of formal equivalence—we use MPC as the engineering implementation and active inference as the theoretical motivation.

CLOSED-LOOP MPC CYCLE
WeatherForecastLWDMPredictionOptimizerRecommend-ationsDemandOutcomesModelUpdateMPCClosed Loop

5.5 Adstock & Saturation Transforms

Advertising has carryover effects (today’s impression influences tomorrow’s purchase) and diminishing returns (the 10,000th impression is less effective than the 1,000th). The LWDM models these with standard transforms from the marketing science literature, extended with weather-conditioned parameters:

Formal specification: geometric adstock & Hill saturationMath
Geometric adstock (carryover):
  A(t) = x(t) + θ · A(t-1)       where θ ∈ [0, 1) is the decay rate

Hill saturation (diminishing returns):
  S(x) = xᵅ / (Kᵅ + xᵅ)         where K = half-saturation point, α = shape

Weather-conditioned specification (v1.0):
  β_c(W) = β_c,0 + β_c,W · W(g,t)
  K_c(W) = K_c,0 · exp(η_c · W(g,t))

Drag the spend bars below to see carryover and saturation in action:

Figure 5. Adstock & saturation. Top: raw daily spend. Middle: carryover echoes. Bottom: revenue response after saturation.

6. The Safety

Most SaaS vendors ship a feature and write a blog post. We ship a feature and then explain, in public, every way it can fail.

WeatherVane handles real marketing budgets. A misallocated dollar is a dollar lost. Every layer assumes the layer below it can fail, and every recommended action has a human-reviewable evidence trail. The system is conservative by default—all actions require explicit approval unless you choose to enable higher autonomy levels after demonstrated performance.

The system can be wrong. Here is exactly how it detects that, what it does next, and what you see.

6.1 Plan & Proof: Human-in-the-Loop by Default

WeatherVane operates in read-only “Plan & Proof” mode by default. Recommendations are surfaced with full evidence bundles. No budget is moved without explicit human approval. This follows the principle articulated by Amodei et al. (2016): AI systems that affect real-world resources should default to advisory mode and require progressive trust calibration before autonomy is granted.

Each recommendation ships with:

  • Backtest results: Historical performance of the same recommendation type in the same market, over 12+ periods.
  • Confidence interval: Uncertainty bounds on the expected revenue lift. The interval tightens as the forecast approaches.
  • Break conditions: Explicit conditions under which the recommendation should be reversed (e.g., “cloud cover exceeds 60%”, “temperature drops below 85°F”).
  • Confounders ruled out: Which alternative explanations have been tested and excluded (holidays, promotions, stockouts, competitor actions).
  • Causal method: The identification strategy used for this specific estimate (weather shock + synthetic control, difference-in-differences, or regression discontinuity).
Example evidence bundle for: “Increase Phoenix sunscreen +22%”
Backtest
12 periods, +14% avg lift
95% CI
+$2.1K to +$4.8K
Break if
Cloud >60% or temp <85°F
Ruled out
Holiday, promo, stockout
Method
Weather shock + synth control
Evidence complete — ready for human review

6.2 Budget Circuit Breakers

4independent circuit breakers on every recommendationInspired by NYSE Rule 80B & SRE error budgets

Because a system that manages your money should be at least as paranoid as you are.

Inspired by financial circuit breakers (NYSE Rule 80B) and site reliability engineering (SRE error budgets), WeatherVane implements hard limits that halt automated allocation when risk thresholds are exceeded:

Spend Velocity Limit

Maximum daily spend change capped at ±20% of baseline. Prevents cascade failures where a confident-but-wrong signal burns budget before detection.

Drawdown Halt

If cumulative ROAS drops below the tenant’s break-even threshold for 3 consecutive days, all automated recommendations pause and alert the human operator.

Forecast Divergence

When the weather forecast ensemble disagrees by >2σ across models, recommendations are held. We do not act on uncertain forecasts.

Anomaly Detection

Z-score monitoring on all input data streams. If ingested data deviates >4σ from historical norms (likely data corruption), the pipeline halts and flags for review.

6.3 Shadow Mode and Progressive Rollout

Before a tenant goes live, the allocator runs in shadow mode: it generates recommendations against live data without executing them, then compares its suggestions to actual outcomes over a holdout period. The NIST AI Risk Management Framework (2023) recommends precisely this approach: validate decision quality in simulation before granting real-world authority.

The progressive rollout sequence is:

👁️
Shadow
Recommendations generated, not executed. Human reviews weekly.
📩
Suggest
Evidence bundles surfaced. Human approves each action.
🔒 14+ days shadow
⚙️
Auto + guardrails
Executes within circuit breaker limits. Daily digest.
🔒 30+ days positive ROAS
🚀
Full auto
Full automation. Circuit breakers active. Weekly summary.
🔒 30+ days Level 2

6.4 Tenant Data Isolation and Privacy

Your data never touches another tenant’s model. Raw sales, spend, and inventory stay in your logical partition. Only learned parameters (elasticity estimates, variance components) are pooled—with calibrated noise injection (designed with differential-privacy principles; formal certification in progress) so that individual tenant behavior cannot be reconstructed from the pooled prior. Each tenant’s data is encrypted at rest with a unique key. Even WeatherVane’s own engineers cannot access individual tenant data without an audited break-glass procedure.

6.5 Model Monitoring and Drift Detection

The LWDM runs continuous monitoring for three classes of drift, following the taxonomy of Lu et al. (2018) on concept drift in data streams:

  • Data drift: Distribution shift in input features (weather patterns, spend levels, sales volumes). Detected via Kolmogorov–Smirnov tests on rolling windows.
  • Concept drift: Change in the weather–demand relationship itself (e.g., a new competitor enters the market, changing the response surface). Detected via CUSUM and Page–Hinkley tests on prediction residuals.
  • Calibration drift: Model confidence intervals that no longer cover observed outcomes at the specified rate. Detected via rolling coverage checks (are 90% intervals covering ~90%?). Because demand data is non-exchangeable—exhibiting temporal dependence, seasonality, and weather-driven distribution shifts—we use adaptive conformal inference (Gibbs & Candès, 2021) to adjust prediction intervals online, maintaining valid coverage even as the data-generating process changes. Barber et al. (2023) extend conformal prediction beyond the classical exchangeability assumption, providing the theoretical foundation for coverage guarantees in our non-stationary setting.

When drift is detected, the system escalates: first to increased monitoring frequency, then to model retraining, and finally to recommendation pausing if the drift is severe enough to compromise causal identification.

6.6 Audit Trail and Reproducibility

Every recommendation is fully reproducible. The system logs:

  • The exact model version, parameters, and training data hash used to generate the recommendation.
  • The weather forecast snapshot (timestamped, versioned) that triggered the signal.
  • The optimization solution (objective value, binding constraints, dual variables) that produced the allocation.
  • The evidence bundle (backtest results, confidence intervals, break conditions) delivered to the user.
  • The outcome (if available): actual demand, actual spend, actual ROAS, and the post-hoc causal estimate for comparison.

This audit trail enables post-hoc analysis of every decision the system has ever made. It is the foundation for continuous improvement: we learn not just from the data, but from our own recommendations and their outcomes. The design follows the NIST AI Risk Management Framework (2023) guidance on traceability and accountability in AI systems.

Full reproducibility: given the same model version and inputs, any recommendation can be regenerated exactly.

6.7 Adversarial Robustness

Because weather is exogenous, the LWDM is naturally resistant to the adversarial manipulation that plagues bid-based systems. However, we implement additional safeguards:

  • Input validation: Weather data is cross-referenced across multiple sources (NOAA, Open-Meteo, ECMWF). A single-source anomaly does not trigger action.
  • Outlier containment: Extreme weather values are winsorized at the 99.5th percentile to prevent single data points from dominating the allocation.
  • Spend smoothing: Recommended spend changes are smoothed over a minimum 48-hour window. Flash-crash-style budget swings are structurally impossible.
  • Rate limiting: API access is rate-limited and authenticated. Allocation endpoints require tenant-scoped API keys with role-based permissions (read-only, suggest, execute).

6.8 Data Protection & Privacy

WeatherVane is designed around privacy by default. The system’s core innovation—using weather as a demand signal—is inherently privacy-safe: weather data is public, requires zero PII, and is immune to cookie deprecation and App Tracking Transparency (ATT).

  • No PII collected: WeatherVane does not collect, store, or process any personally identifiable information. All demand signals are derived from aggregate sales data and public weather observations.
  • GDPR & CCPA: Because the system operates on aggregate market-level data (not individual user data), WeatherVane falls outside the scope of individual-rights provisions. No personal data is processed, stored, or transferred. Data Processing Agreements (DPAs) are available upon request for enterprise customers.
  • Data residency: Tenant sales data is encrypted at rest (AES-256) and in transit (TLS 1.3). Data is stored in the tenant’s chosen region. No cross-tenant data sharing occurs—partial pooling operates on model parameters, not raw data. This gradient-only sharing architecture (McMahan et al., 2017) provides a natural privacy boundary: even if updates are intercepted, the adversary cannot reconstruct the tenant’s underlying sales data.
  • Privacy by design: The system applies differential privacy principles to pooled model parameters. The v4.0 privacy architecture evolves toward federated learning (McMahan et al., 2017), where tenants share only gradient updates—not raw data—and secure aggregation ensures no participant can reconstruct another’s sales figures. Formal certifications (SOC 2, ISO 27001) are in progress. Current security posture details are available upon request.
  • Data retention: Tenant data is retained for the duration of the service agreement plus a configurable retention window (default: 90 days post-termination). Tenants can request immediate deletion at any time.

6.9 Boundary Conditions: Where WeatherVane Will Not Help

A system that cannot name its own failure modes is a system that has not looked.

No system works everywhere. The strongest claim is a bounded claim. Here are the conditions under which WeatherVane adds limited or no value—and how the system detects each one.

Weather-Insensitive Categories

If demand is dominated by non-weather drivers (brand loyalty, price, promotions) and weather’s incremental explanatory power is negligible, the system adds complexity without adding value. We test for this: if the first-stage F-statistic is below 10, we flag the category as likely weather-insensitive and widen all confidence intervals.

Inventory-Censored Markets

Observed sales are truncated by stockouts. If a product frequently sells out during weather-driven demand spikes, the demand signal is censored and the estimated weather elasticity is biased downward. Without reliable inventory data, the system underestimates opportunity.

Inflexible Spend

If contractual commitments, brand guidelines, or platform minimum-spend requirements prevent meaningful reallocation, intelligence is decorative. The system measures opportunity cost but cannot capture it.

Promo/Macro-Dominated Periods

During Black Friday, product launches, or macro shocks (recession, supply chain disruption), weather is a minor signal overwhelmed by larger forces. The system detects this via the relative contribution of weather variables to the demand forecast and automatically reduces allocation aggressiveness.

Rapid Non-Weather Drift

When consumer behavior changes faster than retraining (new competitor disrupts pricing, platform algorithm shifts, viral cultural moment), learned relationships become stale. Drift detection (8.5) catches this, but the response is conservative: widen intervals, reduce autonomy, revert to shadow mode. Not all drift is manageable in real time.

Measurement-Starved Categories

When incremental lift is too small relative to baseline noise to estimate reliably within acceptable time horizons, the system should—and does—widen intervals and lean on guardrails rather than guessing confidently. Lewis & Rao (2015) call this the “unfavorable economics” of measurement. Some signals are real but unverifiable at the available data scale.

6.10 Proof Obligations: How You Prove Us Wrong

The strongest version of WeatherVane’s claim is not “we always beat rules”—“always” is a magnet for counterexamples. The strongest version is conditional and therefore testable: in categories where weather causally shifts demand and moderates ad effectiveness, and where spend is meaningfully reallocatable, a weather-conditioned causal system should outperform static rules on incremental profit after accounting for uncertainty. That claim has boundary conditions. It can be falsified.

We pre-register four evaluation components. Each is designed to be run by the customer, on their own data, without our involvement.

Decision Lift

Does the policy improve incremental profit vs. baselines (rules, status quo, naive smoothing) in geo-randomized holdout tests, net of platform learning effects? The customer’s own revenue data, not ours.

Calibration

Do 90% confidence intervals contain the true outcome ~90% of the time? Do confidence scores correlate with realized lift rather than with model self-esteem?

Ablation

Does removing weather features degrade performance specifically in weather-sensitive categories during anomalies, but not in insensitive categories? If removing weather doesn’t hurt, the weather signal isn’t earning its place.

Safety Audit

How often does the system trigger circuit breakers? How often are recommendations reversed? What is the distribution of drawdowns versus gains? We publish both.

If you can audit these results—wins and losses—then we are not blowing smoke. We are doing science in public, which is rare enough to be a product feature.

Want to review our security posture or request a DPA?

Request Security Documentation

6.11 Pre-Launch Validation

Important context: The results below are from synthetic backtests SYNTHETIC BACKTESTusing generated demand data with known ground-truth weather elasticities—not from production deployments. In these simulation studies, the LWDM’s causal estimates recover the true parameters with a mean absolute error of <8% on in-sample markets and <15% on held-out markets. The partial pooling architecture reduces estimation variance by 40–60% for small tenants (fewer than 90 days of data) compared to per-tenant OLS.

These benchmarks validate the methodology against known ground truth, but they are not a substitute for real-world performance data. Production validation with live tenant data is the next milestone, and results from our initial design partners will be published as they become available.

7. The Vision

It starts with one brand understanding weather. It ends with the economy having a demand nervous system.

WeatherVane is not a point product. It is a research program that begins with weather-driven demand and grows into economic infrastructure. The trajectory is aggressive: prove causal ID (v1) → ship the full deep-learning stack (v2) → scale across verticals (v3) → become the demand intelligence layer of the economy (v4).

Addressable market by version (WeatherVane estimates)v1.0$15BWeather-sensitive e-comv2.0$200BFull deep-learning stackv3.0$1.2TCross-vertical scalev4.0$1.5T+Demand Signal Network
SHIPPING NOW
v1.0

WeatherVane 1.0 — Causal Weather × Demand

The identification layer. Prove that weather-driven causal estimation works, that partial pooling creates a network prior, and that the system earns trust through evidence—not black-box predictions.

  • Estimation layer: Panel ridge regression + GAMs + Bayesian MMM. Empirical Bayes partial pooling (Gelman & Hill, 2006) shrinks tenant-specific parameters toward a population prior learned from the full tenant network—the first data network effect.
  • Causal ID: Weather shocks as natural experiments—exogenous by construction, no instrument needed. Synthetic control estimation (Abadie et al., 2010). Permutation inference for finite-sample validity.
  • Pipeline: Polars + DuckDB + Parquet. Single-node efficiency. No Spark, no warehouse, no distributed compute overhead.
  • Delivery: Plan & Proof mode. Evidence bundles with backtest results, confidence intervals, break conditions, and confounder audit.
  • Safety: Shadow mode, budget circuit breakers, progressive rollout (Level 0–3), full audit trail.
  • Optimization: Convex budget allocation via cvxpy. Saturation-aware. Weather-conditioned constraints.

Addressable market: Weather-sensitive e-commerce categories (sunscreen, outerwear, beverages, outdoor recreation, home comfort). ~$15B in annual U.S. ad spend directly affected by weather.

Why sunscreen? A DTC sun-care brand is the ideal v1.0 tenant. UV index and temperature have a direct, measurable, and causal effect on sunscreen demand—the signal-to-noise ratio is among the highest of any weather-sensitive category. This gives the model its cleanest identification and fastest path to provable ROI.

sunco.weathervane.ai/recommendations
Simulated data for illustration
SunCoSunCoby WeatherVane
Dashboard
Recommendations
Products
Markets
Reports
Settings
SunCo Sunscreen · 5 markets
Today's Recommendations
Feb 10, 2026 · 5 markets · 6 recommendations
● Recommendations ready
Weekly ad budget
$50K
5 markets
Active recs
6
All actionable
Projected lift
+$18.2K
vs naive baseline
Interactions modeled
14
UV×lag×geo×channel
Projected lift by market
Phoenix, AZ
+$1.8K
pull back
±22% CI
Denver, CO
+$3.6K
event boost
±11% CI
Seattle, WA
+$5.2K
σ‑anomaly
±18% CI
Miami, FL
+$1.5K
UV taper
±14% CI
Austin, TX
+$6.1K
channel shift
±16% CI
Product
Market
Action
Signal
Conf.
SPF 50 Beach Lotion
non-obvious
Phoenix, AZ
−18% Google
demand/°F inverts above 105°F
ΔT +14° vs 7d avg · saturation zone
108°F · UV 12
94%
After-Sun Aloe Gel
non-obvious
Phoenix, AZ
+52% all channels
24h UV lag → peak burn-care window
EMA₅(UV) ↑ trending · CDD₇ = 246°F·days
108°F · UV 12 · lag(UV, 24h)
92%
Sport SPF 70 Spray
Denver, CO
+41% event-aware
temp × Trail Race = 2.3× interaction
humidity 18% (demand amplifier)
94°F · UV 11 · Trail Race
93%
Zinc Face Shield
non-obvious
Seattle, WA
+67% surge bid
σ +2.8 vs DMA norm → peak elasticity
MA₇(temp) deviation +19° from local baseline
78°F · UV 7
89%
Daily Glow SPF 30
non-obvious
Miami, FL
−31% taper
d/dt(UV) < 0 for 3 days → spend-off signal
inventory velocity ↑ lag · pre-purchased
95°F · UV 10 · ΔUV −2/day
86%
SPF 50 Beach Lotion
non-obvious
Austin, TX
Shift G→Meta
ROASG 0.8× vs ROASM 2.1× at margin
channel × geo saturation · CPC $4.12
91°F · Google CPC saturated
78%
Portfolio decision · 6 recommendations across 5 markets
Pull back Phoenix saturated spend, surge Seattle \u03C3-anomaly window, reallocate Austin Google\u2192Meta. Projected weekly lift: +$18.2K (90% CI: $12.4K–$24.1K) vs weather-naive baseline
Approve & Execute
IN DEVELOPMENT · Target: Q4 2026
v2.0

WeatherVane 2.0 — The Full Deep-Learning Stack

One architecture upgrade, not four artificial versions. Foundation models, cross-attention, GNNs, multi-task learning, and causal representation learning ship together—because they’re all one transformer backbone with different heads. The linear core becomes a spatiotemporal demand transformer. Zero-shot forecasting for new tenants on day one.

  • Time-series foundation models: Fine-tune Chronos, Moirai, or TimesFM on weather-conditioned demand. Pre-trained on 27–400B time-points, these generalize zero-shot to new tenants without retraining.
  • Cross-attention weather×demand: One sequence (weather over time) attends to another (demand over time), learning nonlinear, lagged, multi-variable interactions that ridge regression represents as a single coefficient. This is what transformers were designed for.
  • Causal representation learning: Causal models generalize under distribution shift (Schölkopf et al., 2021)—exactly when weather shocks create novel demand patterns. The model discovers “temperature causes sunscreen demand” rather than merely correlating with it.

Full v2.0 architecture specification →

Why ship all of this at once? Foundation models, cross-attention, multi-task learning, and GNNs are not separate research programs—they are components of one transformer architecture. Spreading them across 4 artificial versions wastes 3 years. Robyn and Meridian cannot absorb 400B time-points of cross-domain temporal knowledge. We can. The zero-shot capability eliminates cold-start entirely.

Addressable market: All commerce categories with measurable exogenous demand drivers. ~$200B in annual demand-driven resource allocation across U.S. retail and digital commerce.

TARGET: 2027
v3.0

WeatherVane 3.0 — Cross-Vertical Scale

The tech shipped in v2. Now apply it across verticals. Mixture of Experts routes heterogeneous tenants—retail, hospitality, energy, logistics, agriculture, insurance—to specialized sub-networks. Scaling laws predict how accuracy improves with network size.

  • Mixture of Experts routing: Sparse gating routes a sunscreen brand to the “hot climate outdoor” expert and a ski resort to “cold climate recreation”—no manual taxonomy. Expert specialization emerges from data.
  • Cross-vertical transfer: Weather–sunscreen in e-commerce transfers to weather–guest-arrivals in hospitality because both share the underlying outdoor-activity causal mechanism. Six verticals at launch: retail, hospitality, energy, logistics, insurance, agriculture.
  • Scaling laws for demand: More tenants and more history genuinely help—especially in the tails, where extreme weather carries the highest economic value.
  • Multi-signal expansion: Events, public health (CDC ILINet), economic indicators, cultural moments, supply chain data, and LLM-powered creative intelligence. Each signal class gets its own identification approach—weather-adaptive creative recommendations: not just “increase spend” but “switch to outdoor-lifestyle creative in markets above 85°F.”

Full v3.0 specification →

Each new signal class requires its own identification approach. Weather has guaranteed exogeneity. Events are quasi-exogenous (scheduled independently of ad decisions). Economic indicators require careful lag structures to avoid simultaneity. Cultural moments are the hardest — the model must distinguish demand caused by a trend from demand correlated with a trend. The LWDM treats each signal class as a separate identification problem, validated independently before integration.

DISCOVERED INTERACTION STRUCTURE
Demand ResponseΦ(signals, geo, t)differentiableWeatherEventsCulturalEconomicSupply ChainHealthCreativeCreative × Weather3.2× lift when creative matches conditionsEvents × WeatherCompound demand multiplier +65%Health → EconomicFlu season depresses discretionary spendDashed curves: interaction effects discovered by the model, not pre-specified by humans
Dashed curves: interaction effects discovered by the model, not pre-specified by humans.

Addressable market: ~$1.2T in annual efficiency gains from improved demand intelligence across retail, energy, logistics, hospitality, agriculture, and insurance. Each new vertical reveals cross-industry causal links that compound network value.

At this scale the platform operates as economic infrastructure. Thousands of connected systems — Shopify stores, ad platforms, pricing engines, inventory managers, energy grids, logistics routers — feed data in and receive real-time demand intelligence out. The system doesn’t produce recommendations for humans to review. It acts: reallocating budgets, adjusting bids, repositioning inventory, updating prices, rerouting shipments — all within human-set constraints, all continuously, all informed by the full cross-industry demand graph.

The model at this point doesn’t just estimate known effects — it discovers causal structure humans never specified. A heatwave in Phoenix increases demand for sunscreen in Phoenix, but it also shifts discretionary spending in Tucson, reroutes cold-chain logistics from Dallas, and moves insurance pricing in Scottsdale. These cross-industry demand cascades are invisible to any single-vertical model. They emerge from the graph.

The product is not the model. It is the graph. Replicate the architecture and you still start with zero economic connections.
CROSS-ECONOMY SHOCK PROPAGATION
Immediate< 4 hoursShort-term1–3 daysMedium-term1–4 weeksHeatwaveSouthwest US+8°F above normal, 7 daysE-Commerce+40% sunscreen, +28% pool gearEnergy+22% AC load, grid stress alertHospitality+35% pool bookings, resort shiftLogisticscold-chain alert, delivery delaysInsuranceclaims model updating liveAgricultureyield −12%, irrigation +30%Travel demand → local retailEnergy surge → logistics delaysCrop stress → insurance claimsOne exogenous shock, one model: demand effects cascade across the entire economy
NORTH STAR · Horizon: 2028+
v4.0

WeatherVane 4.0 — The Demand Signal Network

The network effects play. Federated learning enables thousands of tenants to collectively train a model better than any could build alone, without sharing raw data. The result is a Hayekian information aggregator—a computational system that solves the knowledge coordination problem Hayek (1945) described, algorithmically.

  • Federated learning (McMahan et al., 2017): Each tenant trains locally on private data, sharing only gradient updates. Secure aggregation ensures no tenant can reconstruct another’s sales. Differential privacy (ε-budgeted noise) provides contractual mathematical guarantees. Federated partial pooling: centralized learning benefits with local privacy guarantees.
  • Data network effects: At 200 tenants × 100 markets × 365 days, the network generates 7.3M natural experiments per year. Each tenant adds a different category in a different geography, creating combinatorially richer training data. More tenants → better model → higher retention → more tenants. At scale, one network with N tenants produces strictly better forecasts than two networks with N/2 each (subadditive cost).
  • Autonomous execution: Real-time bidding integration, autonomous budget allocation, inventory pre-positioning, dynamic pricing—all weather-conditioned, all operating within merchant-set constraints. The product is no longer a dashboard. It is the demand intelligence layer that automated systems across the economy connect to.

Hayek’s knowledge problem, solved computationally. Just as prices aggregate dispersed information across millions of actors (Hayek, 1945), the Demand Signal Network aggregates dispersed weather–demand knowledge into a unified representation. No single tenant sees the full cascade: beverage demand in Austin leading outdoor gear demand in Denver by 48 hours. The network sees all of it.

Addressable market: $1.5T+ in annual efficiency gains. A competitor entering at this stage must simultaneously acquire thousands of tenants to match the learning signal. The moat is not the software. The moat is the network itself—accumulated learning that cannot be bought, cannot be replicated quickly, and grows more valuable with every new participant.

DEMAND SIGNAL NETWORK — AT-SCALE VISION
14,200+ connected (target)95%+ accuracy (projected)48K actions/day (target)
Government & PublicEmergency mgmt, agriculture, cities, transitGlobal EnterpriseFortune 500, multinationalsSMB & LocalShopify, restaurants, regionalFinancial Marketscommodity, insurance, hedgeInfrastructureenergy grids, water, transitAgriculture & Foodfarms, distributors, groceryHealth & Pharmahospitals, pharma, wellnessEvery node feeds the economic graph. Every node benefits.
AUTONOMOUS ACTIONS — LAST 60 SECONDS
14:23:04County EMA: pre-staged cooling centers across metroheat mortality model triggered, 48h advance positioning
14:23:05Grid operator: shifted 340MW load forecast for regiondemand cascade: retail cooling + residential AC + EV charging
14:23:07National retailer: rerouted 12 cold-chain shipmentsperishable demand surge 3d out, inventory pre-positioning
14:23:08SMB network: 1,400 stores updated pricing & creativeSW region, auto-optimized within merchant-set constraints
14:23:09Commodity desk: adjusted corn futures position −0.4σyield revision −12% TX/OK, irrigation demand +30%
14:23:10Metro transit: added 40 bus routes for cooling accessdemand model: ridership to cooling centers +280% in 48h
14:23:11Discovery: concert tickets → ER visits → pharma demandthree-hop causal chain discovered, lag +18h, r=0.79

See the Product Evolve

Each version builds on the last. The interface grows as the model deepens.

app.weathervane.ai/v2.0
Simulated data for illustration
⛰️ Summit Craftby WeatherVane
Dashboard
Compound Signals
Events
Products
Markets
Reports
Summit Craft Co. · 14 markets
Multi-Signal Intelligence
Austin, TX · Weather + Events + Calendar · 7-day outlook
● Live
Mon
98°
74°
Tue
101°
76°
Wed
95°
72°
Thu
99°
75°
Fri
103°
78°
Sat
104°
79°
Sun
88°
68°
SXSW Festival
Austin Marathon
Craft Beer Week
1.6x
1.8x
1.3x
1.5x
1.6x
1.9x
1.1x
Weather effectEvent effectSaturday: 98°F + Marathon + Craft Beer Week = 1.95x demand
v2.0Deep Learning StackSpatiotemporal transformer, zero-shot forecasting
app.weathervane.ai/v3.0
Simulated data for illustration
WeatherVanev3.0
Portfolio
Categories
Cross-Sell
Attribution
Markets
Reports
Multi-Category · 34 markets
Category Portfolio
Market
Category
Action
Signal
Conf.
Southwest US
Sun Portfolio
Rebalance: sunscreen→hats
Saturation detected
89%
Pacific NW
Cozy Bundle
Cross-sell: blankets + cocoa
Cold snap + weekend
87%
v3.0Cross-Vertical ScaleGovernment, logistics, agriculture, insurance
app.weathervane.ai/v4.0
Simulated data for illustration
WeatherVanev3.0
Portfolio
Categories
Cross-Sell
Attribution
Markets
Reports
Multi-Category · 34 markets
Category Portfolio
Market
Category
Action
Signal
Conf.
Southwest US
Sun Portfolio
Rebalance: sunscreen→hats
Saturation detected
89%
Pacific NW
Cozy Bundle
Cross-sell: blankets + cocoa
Cold snap + weekend
87%
v4.0Demand Signal NetworkCross-economy shock propagation, autonomous actions
PRICING
  • Outcome-based SaaS: You pay a percentage of the incremental revenue WeatherVane generates, measured via geo-holdout tests that you can independently verify—randomly selected markets receive no recommendations, giving you a clean control group in your own analytics.
  • Your data, your verification: We know the concern: a vendor measuring its own impact has a conflict of interest. That’s why holdout markets are randomly assigned, results are computed from your own revenue data, and you can run the same lift calculation independently. We also offer a flat monthly SaaS fee for teams that prefer predictable costs over outcome-based pricing.
  • Target customer: DTC and mid-market ecommerce brands spending $50K–$5M/month on digital ads.
  • Start with weather, expand from there: Begin with weather-driven budget reallocation, then expand to full demand intelligence as you see results.
  • Pure software: No human-in-the-loop per recommendation. The model runs autonomously with Plan & Proof oversight.
  • Incentive alignment: You only pay on proven incremental revenue. Past performance is not indicative of future results.
DESIGN PARTNER PROGRAM

WeatherVane is pre-launch. We do not yet have production performance data or customer testimonials to share. Instead, we are building with a small cohort of design partners who help shape the product in exchange for early access and preferential pricing.

  • What you get: Free weather-demand analysis for your top category, 90-day shadow mode with no commitment, and direct access to the founding team.
  • What we get: Real data to validate our models, product feedback from practitioners, and (with your permission) anonymized case study metrics we can share publicly.
  • Cohort size: 5–10 brands in weather-sensitive categories (outdoor, beverage, apparel, beauty, home comfort). Applying now for Q2 2026 onboarding.
  • Transparency commitment: We will publish anonymized validation results—wins and losses—as design partners complete their initial 90-day evaluation period.
Private Beta — Limited Spots

Weather is changing demand right now.

Our design partners are already seeing the signal. Join the private beta to connect your ad accounts and get a live weather-demand forecast within 48 hours.

No commitment required. We’ll set up a 30-minute call to understand your use case, then deliver a free weather-demand analysis for your top category.

hello@thagorus.com · Private beta · San Francisco

References & Citations75+ PEER-REVIEWED75+ citations

The LWDM draws on a deep body of work across causal inference, control theory, information theory, machine learning, complexity science, marketing science, and AI safety. Key references are organized by domain.

Causal Inference & Econometrics

  • Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies. Journal of the American Statistical Association, 105(490), 493–505.
  • Angrist, J. D. & Krueger, A. B. (2001). Instrumental variables and the search for identification. Journal of Economic Perspectives, 15(4), 69–85.
  • Athey, S. & Imbens, G. W. (2017). The state of applied econometrics: causality and policy evaluation. Journal of Economic Perspectives, 31(2), 3–32.
  • Athey, S. & Imbens, G. W. (2022). Design-based analysis in difference-in-differences settings with staggered adoption.Journal of Econometrics, 226(1), 62–79.
  • Imbens, G. W. & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press.
  • Rosenbaum, P. R. (2002). Observational Studies (2nd ed.). Springer. (Design of observational studies; sensitivity analysis.)
  • Chernozhukov, V. et al. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68. (DML; orthogonal scores for causal estimation with high-dimensional nuisance parameters.)
  • Dell, M., Jones, B. F., & Olken, B. A. (2014). What do we learn from the weather? The new climate–economy literature. Journal of Economic Literature, 52(3), 740–798. (Canonical survey of weather as instrument for causal identification in economics.)
  • Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242. (Causal forests for heterogeneous treatment effect estimation.)
  • Hartford, J., Lewis, G., Leyton-Brown, K., & Taddy, M. (2017). Deep IV: a flexible approach for counterfactual prediction. ICML 2017, PMLR 70, 1414–1423. (Two-stage neural network IV estimation; captures nonlinear instrument–treatment relationships.)
  • Callaway, B. & Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230. (Honest 2×2 DiD comparisons avoiding negative weighting in staggered adoption.)
  • Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254–277. (Decomposition of TWFE into weighted 2×2 estimators; diagnostic for staggered-treatment bias.)
  • Wright, P. G. (1928). The Tariff on Animal and Vegetable Oils. Macmillan. (Appendix B: first instrumental variables estimation in econometric history, using weather to identify demand elasticities.)
  • Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly exogenous. Review of Economics and Statistics, 94(1), 260–272. (Framework for bounded inference when the exclusion restriction is only approximately satisfied.)

Cybernetics, Control Theory & Information Theory

  • Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman & Hall. (The law of requisite variety.)
  • Ashby, W. R. (1960). Design for a Brain (2nd ed.). Chapman & Hall. (Adaptive control, ultrastability.)
  • Conant, R. C. & Ashby, W. R. (1970). Every good regulator of a system must be a model of that system. International Journal of Systems Science, 1(2), 89–97.
  • Shannon, C. E. (1948). A mathematical theory of communication.Bell System Technical Journal, 27(3), 379–423.
  • Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.
  • Camacho, E. F. & Bordons, C. (2007). Model Predictive Control (2nd ed.). Springer. (MPC theory and applications.)
  • Friston, K. (2010). The free-energy principle: a unified brain theory?Nature Reviews Neuroscience, 11(2), 127–138. (Active inference; organisms as entropy-minimizing controllers.)
  • Touchette, H. & Lloyd, S. (2000). Information-theoretic limits of control. Physical Review Letters, 84(6), 1156–1159. (Theorem 1: ΔS ≤ I(M; S); each bit of mutual information reduces entropy by at most one bit.)
  • Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv:physics/0004057. (Optimal lossy compression preserves decision-relevant information; the theoretical foundation for why rules compress for legibility while models compress for decision quality.)
  • Beer, S. (1972). Brain of the Firm. Allen Lane / Penguin Press. (Viable System Model; recursive control architecture for organizations.)
  • Boisot, M. & McKelvey, B. (2011). Complexity and organization–environment relations: revisiting Ashby’s law of requisite variety. In The SAGE Handbook of Complexity and Management, 279–298. (Requisite variety is necessary but not sufficient; the controller must also match the complexity structure of the system.)
  • von Foerster, H. (1981). Observing Systems. Intersystems Publications. (Second-order cybernetics; the observer is part of the system being observed.)
  • Powers, W. T. (1973). Behavior: The Control of Perception. Aldine. (Perceptual control theory; organisms control perception, not output.)

Complexity Science & Nonlinear Dynamics

  • Anderson, P. W. (1972). More is different. Science, 177(4047), 393–396. (Emergence and symmetry breaking.)
  • Bak, P. (1996). How Nature Works: The Science of Self-Organized Criticality. Springer. (Phase transitions, power laws.)
  • Goodhart, C. A. E. (1984). Problems of monetary management: the UK experience. Monetary Theory and Practice, 91–121.
  • Holland, J. H. (2014). Complexity: A Very Short Introduction. Oxford University Press.
  • Kauffman, S. A. (1993). The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press.
  • Mandelbrot, B. B. (1963). The variation of certain speculative prices.Journal of Business, 36(4), 394–419. (Fat tails.)
  • Peters, O. (2019). The ergodicity problem in economics. Nature Physics, 15, 1216–1221.
  • Scheffer, M. et al. (2009). Early-warning signals for critical transitions. Nature, 461, 53–59. (Critical slowing down.)
  • Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House. (Fat tails, extremes.)

Machine Learning & Deep Learning

  • Gama, J. et al. (2014). A survey on concept drift adaptation.ACM Computing Surveys, 46(4), 1–37.
  • Kipf, T. N. & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. ICLR 2017.
  • Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles.NeurIPS 2017.
  • Lu, J. et al. (2018). Learning under concept drift: a review.IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346–2363.
  • Vaswani, A. et al. (2017). Attention is all you need. NeurIPS 2017.
  • Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic Learning in a Random World. Springer. (Conformal prediction.)
  • Kaplan, J. et al. (2020). Scaling laws for neural language models.arXiv:2001.08361. (Power-law relationship between loss and model size, dataset size, and compute.)
  • Hoffmann, J. et al. (2022). Training compute-optimal large language models. NeurIPS 2022. (Chinchilla scaling laws; model and data should scale equally for compute-optimal training.)
  • Schölkopf, B. et al. (2021). Toward causal representation learning.Proceedings of the IEEE, 109(5), 612–634. (Models that learn causal structure generalize under distribution shift.)
  • Gregory, R. W. et al. (2021). The role of artificial intelligence and data network effects for creating user value. Academy of Management Review, 46(3), 534–551. (Framework for data network effects in ML platforms.)

Marketing Science & Media Mix Modeling

  • Gelman, A. & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
  • Jin, Y. et al. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Technical Report.
  • Sun, Y., Wang, Y., Jin, Y., Chan, D., & Koehler, J. (2017). Geo-level Bayesian hierarchical media mix modeling. Google Technical Report. (Partial pooling across geographic regions for sparse-data markets.)
  • Wang, Y., Jin, Y., Sun, Y., Chan, D., & Koehler, J. (2017). A hierarchical Bayesian approach to improve media mix models. Google Technical Report. (Cross-brand partial pooling; direct ancestor of cross-tenant hierarchical estimation.)
  • Runge, J., Skokan, A., Zhou, Y., & Pauwels, K. (2024). Robyn: continuous and semi-automated marketing mix modeling.arXiv:2407.06182. (Meta’s open-source MMM; ridge regression + Nevergrad optimization.)
  • Rossi, P. E., Allenby, G. M., & McCulloch, R. (2005). Bayesian Statistics and Marketing. Wiley. (Canonical reference for hierarchical Bayesian methods in marketing science.)
  • Varian, H. R. (2014). Big data: new tricks for econometrics.Journal of Economic Perspectives, 28(2), 3–28.
  • Weatherford, L. R. & Bodily, S. E. (1992). A taxonomy and research overview of perishable-asset revenue management.Operations Research, 40(5), 831–844.
  • Busse, M. R., Pope, D. G., Pope, J. C., & Silva-Risso, J. (2015). The psychological effect of weather on car purchases. Quarterly Journal of Economics, 130(1), 371–414. (Weather-induced salience causally shifts 40M+ vehicle transactions.)
  • Shapiro, B. T., Hitsch, G. J., & Tuchman, A. E. (2021). TV advertising effectiveness and profitability: generalizable results from 288 brands. Econometrica, 89(4), 1855–1879. (Causal advertising measurement at scale; median ad elasticity ~0.014.)
  • Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics, 9(1), 247–274. (CausalImpact; Bayesian counterfactual estimation.)
  • Gordon, B. R., Zettelmeyer, F., Bhatt, N., & Goldfarb, A. (2019). A comparison of approaches to advertising measurement: evidence from big field experiments at Facebook. Marketing Science, 38(2), 193–225. (Observational methods vs. RCTs for ad measurement.)
  • Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: a large-scale field experiment.Econometrica, 83(1), 155–174. (eBay experiment: brand search ads had near-zero causal effect on sales.)
  • Lewis, R. A. & Rao, J. M. (2015). The unfavorable economics of measuring the returns to advertising. Quarterly Journal of Economics, 130(4), 1941–1973. (Even massive RCTs produce median ROI confidence intervals over 100 percentage points wide.)

AI Weather Forecasting

  • Kochkov, D. et al. (2024). Neural general circulation models for weather and climate. Nature, 632, 1060–1066. (NeuralGCM; hybrid ML + physics model beats ECMWF ENS 95% of the time on 2–15 day forecasts.)
  • Lam, R. et al. (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416–1421. (GraphCast; outperforms ECMWF HRES on 89.3% of 2,760 targets; 10-day forecast in under 1 minute.)
  • Lang, S. et al. (2024). AIFS — ECMWF’s data-driven forecasting system. arXiv:2406.01465.
  • Pathak, J. et al. (2022). FourCastNet: a global data-driven high-resolution weather forecasting model. arXiv:2202.11214.
  • Bi, K. et al. (2023). Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970), 533–538. (Pangu-Weather; first AI model to outperform ECMWF HRES.)
  • Price, I. et al. (2024). Probabilistic weather forecasting with machine learning. Nature, 637(8044), 84–90. (GenCast; diffusion-based ensemble forecasting; outperforms ENS on 97.2% of targets, 99.8% beyond 36h lead times.)
  • Bodnar, C. et al. (2025). Aurora: a foundation model for the Earth system.Nature, 641, 1180–1187. (Pretrained on 1M+ hours of diverse Earth system data; fine-tunes to new tasks in minutes.)
  • Chen, L. et al. (2024). FuXi-S2S: a machine learning model that outperforms conventional global subseasonal forecast models. Nature Communications, 15, 6296. (Sub-seasonal to seasonal AI forecasting.)
  • Vaughan, A. et al. (2025). Aardvark Weather: end-to-end data-driven weather prediction. Nature, 641, 1172–1179. (Station observations directly to local forecasts, bypassing gridded analysis.)
  • Nguyen, T. et al. (2024). Scaling transformer neural networks for skillful and reliable medium-range weather forecasting. NeurIPS 2024. (Stormer; demonstrates favorable scaling laws for weather AI—accuracy improves predictably with model size and training data.)

AI Safety & Responsible Deployment

  • Amodei, D. et al. (2016). Concrete problems in AI safety.arXiv:1606.06565.
  • Dwork, C. & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.
  • McMahan, H. B. et al. (2017). Communication-efficient learning of deep networks from decentralized data. AISTATS 2017. (Federated learning.)
  • National Institute of Standards and Technology (2023). AI Risk Management Framework (NIST AI 100-1).

Statistics & Bayesian Methods

  • Efron, B. & Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association, 70(350), 311–319. (Empirical Bayes.)
  • James, W. & Stein, C. (1961). Estimation with quadratic loss.Proceedings of the Fourth Berkeley Symposium, 1, 361–379. (The Stein paradox; foundation for shrinkage estimation.)
  • Morris, C. N. (1983). Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association, 78(381), 47–55.
  • Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465–480. (Heavy-tailed prior for sparse estimation; aggressively shrinks noise while preserving true signals.)
  • Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2023). Conformal prediction beyond exchangeability. Annals of Statistics, 51(2), 816–845. (Extends conformal prediction to non-exchangeable settings including time series and distribution shift.)

Demand Forecasting & Time Series

  • Lim, B., Arík, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4), 1748–1764. (Attention-based architecture with variable selection and interpretable temporal patterns.)
  • Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: probabilistic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3), 1181–1191. (Autoregressive RNN for probabilistic demand forecasting.)
  • Ansari, A. F. et al. (2024). Chronos: learning the language of time series. Transactions on Machine Learning Research. (Tokenizes real-valued time series into a fixed vocabulary; T5-based models from 20M to 710M params; zero-shot performance rivals full-shot models on 42 benchmark datasets.)
  • Woo, G. et al. (2024). Unified training of universal time series forecasting transformers. ICML 2024. (Moirai; any-variate, any-frequency foundation model trained on 27B observations across nine domains; competitive zero-shot vs. full-shot models.)
  • Das, A. et al. (2024). A decoder-only foundation model for time-series forecasting. ICML 2024. (TimesFM; pre-trained on 100B+ real-world time-points; strong zero-shot performance across domains and granularities.)
  • Nie, Y., Nguyen, N. H., Sinthong, P., & Kalagnanam, J. (2023). A time series is worth 64 words: long-term forecasting with transformers. ICLR 2023. (PatchTST; channel-independent patch design achieves 21% MSE reduction over prior transformer methods.)
  • Liu, Y. et al. (2024). iTransformer: inverted transformers are effective for time series forecasting. ICLR 2024 (Spotlight). (Treats each variate as a token; state-of-the-art on high-dimensional multivariate forecasting.)
  • Fildes, R., Ma, S., & Kolassa, S. (2022). Retail forecasting: research and practice. International Journal of Forecasting, 38(4), 1283–1318. (Comprehensive survey identifying weather as an under-exploited signal in retail demand forecasting.)
  • Gibbs, I. & Candès, E. (2021). Adaptive conformal inference under distribution shift. NeurIPS 2021. (Distribution-free prediction intervals that adapt to non-stationarity.)

Weather–Demand Economics

  • Roth Tran, B. (2023). Sellin’ in the rain: weather, climate, and retail sales. Management Science, 69(12), 7423–7447. (ML weather index on daily store-level retail; 2–5% of sales variance explained by weather after controls.)
  • Mellon, J. (2025). Rain, rain, go away: 194 potential exclusion-restriction violations for weather instruments. American Journal of Political Science, 69, 881–898. (Systematic audit of weather-instrument validity; motivates careful specification in the LWDM’s DAG.)

Questions, access requests, or partnership inquiries: hello@thagorus.com