The Diffusion : measuring when installation becomes deployment - A White Paper Preview
Purpose
This section sets out a measurement and modelling framework that tracks how artificial intelligence moves from infrastructure build-out to enterprise value and economy-wide outcomes. The aim is decision-grade monitoring rather than narrative. Indicators are defined precisely, linked to public data where possible, and embedded in a model that can separate transient noise from structural change.
Architecture of the dashboard
We organise evidence in four layers that map the path from capability to value. Each layer is observable and updated on a regular cadence.
A. Infrastructure intensity
What to track: quarterly capital expenditure on data centres and AI servers; shipments and installed base of accelerators; usable compute capacity; data-centre megawatts under construction and energised; queue times for power connections; effective inference cost per million tokens or per task; latency distributions.
Measurement notes: use firm filings and audited disclosures for capex; vendor shipment data for accelerators; utility and permitting records for energy availability; standardised cost templates for inference.
B. Tooling and developer usage
What to track: active developer seats for coding assistants; acceptance rates of suggested code; API token consumption and unique application IDs; fine-tune and retrieval volumes; automated test coverage for AI-generated changes.
Measurement notes: normalise seats by full-time equivalent developers; quality-adjust acceptance rates; require providers to disclose stable definitions of “active”.
C. Enterprise productionisation
What to track: share of workflows in production that use AI with formal model-risk governance approved; proportion of models with external or independent audits; time from pilot to prod for top use cases; incremental ARPU or margin uplift attributable to AI features; share of support tickets resolved by AI that pass human QA.
Measurement notes: count workflows, not proofs of concept; attribute value to AI only where a counterfactual is documented; all metrics must carry a versioned evaluation protocol.
D. Outcomes
What to track: sector-level labour productivity and total factor productivity with exposure-based splits; quality metrics in customer-facing tasks (error rates, service-level adherence); measured risk reductions in high-stakes processes (false positives, capital charges).
Measurement notes: rebase sector indexes to a common base year for comparability; use exposure scores and employment weights to form high- versus low-exposure cohorts.
The modelling spine
The dashboard is not only a set of charts. It is anchored by a transparent model that links layers A through D and produces interpretable parameters and confidence intervals.
Stacked diffusion curves
For each sector i and layer L ∈ {A, B, C, D}, estimate logistic curves with sector-level random effects:
xiL(t) = KiL ⁄ (1 + exp(−riL(t − τiL))) + εiL(t)
where K is the saturation level, r the slope, and τ the midpoint. Priors pool information across sectors while allowing heterogeneity.
Layer linkages and lags
Impose distributed-lag relations so that downstream layers respond to upstream layers with interpretable delays:
Bit = αB + Σℓ=0LB βBℓ Ai,t-ℓ + uBit,
Cit = αC + Σℓ=0LC βCℓ Bi,t-ℓ + uCit,
Dit = αD + Σℓ=0LD βDℓ Ci,t-ℓ + γ Zit + uDit.
Controls Zit include energy prices, regulatory milestones, and macro shocks. Identification relies on variation across sectors and time, with sector and year fixed effects and clustered standard errors.
Event studies and thresholds
Augment with an exposure-by-year event study to test pre-trends. Define explicit turning points. Examples: a sector enters deployment when at least 30 percent of its material workflows are in production with signed-off governance and the measured EBIT contribution from AI exceeds 5 percent; the economy shows an AI-related lift when the productivity gap between high- and low-exposure sectors widens by at least 1.5 index points for two consecutive years with statistical significance.
Data pipeline and governance
Sources
Firm filings and earnings materials for capex and guidance; vendor shipment data for accelerators; utility and permitting records for data-centre power; API platforms and developer tool telemetry for usage; enterprise system logs for workflow counts and guardrail coverage; official statistics for sector productivity; published exposure taxonomies mapped to industries with employment weights.
Harmonisation and reproducibility
All series are versioned with code to replicate transformations. Units, denominators, and base years are documented. Definitions of “active user”, “workflow in production”, and “audit” are fixed and referenced in each release. Nothing is estimated from proprietary black-box endpoints without a parallel open measurement.
Quality control
Apply cross-checks across providers; reconcile capex guidance with realised spend; test sensitivity to alternative exposure taxonomies; publish uncertainty intervals for all model outputs. Independent re-estimation is supported through a public repository.
Scenarios to 2030
Base
Inference costs fall by a factor of three by 2027, evaluation and guardrail stacks standardise gradually, procurement standards diffuse, and productionised use cases reach 15 to 25 percent of eligible workflows in leading sectors. Productivity effects appear with a two to three year lag.
Fast
Costs fall by an order of magnitude by 2026, regulators accept a common audit stack, and model-assisted process redesign accelerates. Productionised use reaches 40 percent in leading sectors and measurable productivity differentials widen. EBIT contributions above 5 percent become common in top quartile firms.
Stall
Energy and grid constraints bind; legal liability and data governance remain unsettled; inference costs decline slowly. Pilots proliferate but do not scale. Productivity differentials remain indistinguishable from zero outside narrow pockets.
Each scenario is parameterised by the estimated diffusion slopes and lags, then stress-tested with sensitivity to energy prices, capital availability, and regulatory shock variables.
Reporting cadence and deliverables
Quarterly dashboard with the latest indicators for A through D, sector cut-downs, and an updated nowcast of diffusion parameters and lags.
Semi-annual methods note that documents any changes in definitions, data sources, or model specification, with ablation tests.
Annual review that reconciles forecasts with realised outcomes, reports on structural breaks, and revises the scenario set.
How to interpret movements
Rapid growth in layer A without commensurate movement in layers B and C indicates installation without deployment.
Rising acceptance rates and stable quality-adjusted usage in layer B combined with growing governance coverage and measured EBIT contributions in layer C indicate that firms are crossing the internal adoption threshold.
Persistent and statistically significant widening in exposure-sorted productivity in layer D is required before macro claims are justified.
Policy translation
The white paper preview is designed to inform policy as well as management. If layer A is constrained by energy, the response is transparent planning for compute and power, including accelerated connections and disclosure of energy and water use. If layer C is the bottleneck, regulators should focus on auditability, incident reporting, and liability allocation that lowers the fixed costs of deploying models safely. If layer D is flat despite movement in C, labour market policy should prioritise training and task redesign within exposed occupations, not generic reskilling.
Summary
The Diffusion Dashboard makes adoption measurable. It defines a coherent set of indicators, links them with an explicit econometric model, and sets decision-relevant thresholds. It provides a disciplined way to say when AI is moving from installation to deployment, where the bottlenecks sit, and what levers policy and management should pull.
-
Indicator dictionary for the Diffusion Dashboard. Each indicator includes a code, definition and unit, preferred sources, refresh cadence, coverage, and notes on construction and caveats. Use these codes in figures, methods, and change logs.
Layer A. Infrastructure intensity
A1. Hyperscaler AI capex (US$ bn, T4Q)
Definition: Reported capital expenditure on data centres, servers and networking attributable to AI; trailing four-quarter sum.
Sources: Company filings and earnings materials; segment or management attribution where available.
Refresh: Quarterly.
Coverage: Global hyperscalers; optional regional split where disclosed.
Notes: Document inclusion of capital leases and reclassifications; maintain a reconciliation table from reported to AI-attributed capex.
A2. Accelerator shipments (units, quarterly)
Definition: Shipments of AI accelerators by vendor and model family.
Sources: Vendor disclosures; industry shipment trackers.
Refresh: Quarterly.
Coverage: Global; optional cloud versus enterprise split.
Notes: Align model naming across vendors; retain a “method flag” when estimates are used.
A3. Installed accelerator compute (PFLOPS-equivalent)
Definition: Cumulative compute from deployed accelerators = Σ(shipments × per-unit peak FLOPS × utilisation factor).
Sources: A2 plus vendor specs; utilisation from provider disclosures or conservative assumptions.
Refresh: Quarterly.
Coverage: Global and by region where possible.
Notes: Publish the utilisation factor and sensitivity.
A4. Data-centre capacity under construction and energised (MW)
Definition: Megawatts under construction and connected for AI-suitable halls.
Sources: Developer reports; utility interconnection queues; planning databases.
Refresh: Quarterly.
Coverage: Region and metro.
Notes: Track cancellations and slippage; harmonise nameplate versus deliverable power.
A5. Grid connection lead time (months)
Definition: Median time from application to energisation for data-centre loads above a disclosed threshold.
Sources: Utility queues; regulator releases.
Refresh: Semi-annual.
Coverage: Region.
Notes: Report thresholds and any queue reforms affecting interpretation.
A6. Inference cost per task (US$)
Definition: All-in variable cost to run a specified task template at target latency and quality.
Sources: Provider price cards; internal cost templates; measured latency.
Refresh: Quarterly.
Coverage: Model family and task archetype.
Notes: Fix prompts, context length, and guardrails; publish the task card and hardware profile.
A7. Latency distribution (p50, p95, ms)
Definition: End-to-end time for the task template in A6.
Sources: Synthetic monitors; provider telemetry where available.
Refresh: Monthly.
Coverage: Model family and region.
Notes: Exclude first-token warm-up effects unless declared.
Layer B. Tooling and developer usage
B1. Active coding-assistant seats (FTE-normalised)
Definition: Monthly active users of coding assistants divided by software FTEs.
Sources: Vendor admin telemetry; HRIS headcount.
Refresh: Monthly.
Coverage: Firm and sector panels.
Notes: Define “active” consistently; remove trial spikes.
B2. Acceptance rate of AI suggestions (%)
Definition: Accepted tokens or lines over suggested tokens or lines.
Sources: IDE telemetry.
Refresh: Monthly.
Coverage: Language and repo.
Notes: Quality-adjust using unit and integration tests.
B3. API token consumption (tokens, T30D)
Definition: Total tokens billed over 30 days by application ID.
Sources: Provider dashboards; billing exports.
Refresh: Monthly.
Coverage: App and business unit.
Notes: Track context length and cache hit rates.
B4. Unique production applications (count)
Definition: Distinct application IDs serving external or internal users with SLOs.
Sources: API gateway; service registry.
Refresh: Monthly.
Coverage: Firm; sector aggregates.
Notes: Exclude sandboxes and experiments.
B5. Fine-tune and RAG operations (count/QPS)
Definition: Completed fine-tunes; retrieval queries per second and hit rates.
Sources: MLOps platform; vector index telemetry.
Refresh: Monthly.
Coverage: App and model.
Notes: Log index freshness and guardrail triggers.
B6. Eval pass rate on curated suite (%)
Definition: Share of test cases meeting acceptance thresholds on a fixed evaluation battery.
Sources: Internal eval harness.
Refresh: Monthly.
Coverage: Use case and version.
Notes: Version every eval and data freeze; publish leakage checks.
Layer C. Enterprise productionisation
C1. Workflows in production with approved MRG (%)
Definition: Share of material workflows in production that use AI and have model-risk governance sign-off.
Sources: Risk registry; change-management system.
Refresh: Quarterly.
Coverage: Firm; sector aggregates.
Notes: Define “material workflow” and “production” ex-ante.
C2. Independently audited models (%)
Definition: Share of production models with external or independent audit in the past 12 months.
Sources: Audit logs; vendor attestations.
Refresh: Quarterly.
Coverage: Firm and sector.
Notes: Record audit scope and standards.
C3. Pilot-to-production lead time (days, median)
Definition: Calendar days from pilot approval to production.
Sources: PMO and release systems.
Refresh: Quarterly.
Coverage: Use case type.
Notes: Winsorise extreme values; report IQR.
C4. AI-attributable EBIT share (%)
Definition: Incremental EBIT attributable to AI in a business unit over EBIT, with documented counterfactual.
Sources: Finance; A/B or stepped-wedge rollouts.
Refresh: Semi-annual.
Coverage: Business unit.
Notes: Publish attribution method and holdout design.
C5. AI feature ARPU uplift (US$ per user)
Definition: Incremental revenue per user linked to AI features versus control.
Sources: Product analytics; billing.
Refresh: Quarterly.
Coverage: Product line.
Notes: Separate price from mix effects.
C6. Automated resolution with QA pass (%)
Definition: Share of support or back-office tickets resolved by AI that pass human QA.
Sources: Ticketing system; QA samples.
Refresh: Monthly.
Coverage: Function and severity.
Notes: Track re-open rates and escalations.
C7. Safety and compliance incidents (per 1k requests)
Definition: Confirmed incidents linked to AI components.
Sources: Incident response; compliance logs.
Refresh: Monthly.
Coverage: Firm; sector aggregates.
Notes: Severity tiers and root-cause tags required.
Layer D. Outcomes
D1. Labour-productivity gap (index points)
Definition: Difference between high- and low-exposure sector indexes, rebased to a common year.
Sources: Official statistics; exposure taxonomy with employment weights.
Refresh: Annual or quarterly where available.
Coverage: Country and sector.
Notes: Report exposure metric and weights; include CIs.
D2. Total factor productivity differential (index points)
Definition: As D1 for TFP.
Sources: Official productivity datasets.
Refresh: Annual.
Coverage: Sector.
Notes: Document capital and labour quality adjustments.
D3. Wage dispersion by exposure (pp)
Definition: Difference in wage growth between exposure deciles.
Sources: Labour microdata; exposure mapping to occupations.
Refresh: Annual.
Coverage: Country; sector optional.
Notes: Control for composition effects.
D4. Occupational churn in high-exposure roles (%)
Definition: Exits plus entries as a share of employment in exposure-intense occupations.
Sources: Labour force surveys; admin data.
Refresh: Annual.
Coverage: Country.
Notes: Adjust for reclassification breaks.
D5. Quality and safety metrics in AI-supported services
Definition: Error rates, SLA adherence, adverse events where AI is embedded.
Sources: Sector regulators; firm QA.
Refresh: Quarterly.
Coverage: Sector-specific.
Notes: Publish metric definitions per sector.
Versioning and change control
Assign stable codes (A1…D5).
Maintain a data dictionary file with variable name, unit, source URL, transformation script, and first date loaded.
Every release carries a version tag (e.g., DD-2025-Q3) and a short change log noting any definition or source change.
-
This section sets precise construction rules for every indicator in the Diffusion Dashboard. It specifies units, transformations, formulas, rebasing, attribution methods, and quality controls. Adopt these rules verbatim so each quarterly release is comparable through time.
2.0 Common rules
Time index. Quarterly unless noted. Use calendar quarters and label as YYYY-Qn.
Currencies. Report in nominal USD. Translate non-USD items with the quarter-average WM/Refinitiv spot rate.
Deflators. Do not deflate capex or revenues in the dashboard. Use nominal series for timeliness. Provide deflated runs in the methods appendix if needed.
Revisions. Retain point-in-time vintages. Publish a change log when historical values move.
Winsorisation. For ratios and growth rates, winsorise at 1 and 99 percent. Do not winsorise levels.
Missing values. Interpolate only within a two-quarter gap and flag the value. Never extrapolate beyond the sample end.
Identifiers. Use stable codes A1…D5 and a version tag, for example DD-2025-Q3.
2.1 Layer A. Infrastructure intensity
A1. Hyperscaler AI capex (US$ bn, T4Q)
Computation: sum capex from firm filings. Attribute the AI share using management breakouts or a documented allocation key.
Formula:
\text{A1}{t}=\sum{f \in F}\big(\text{Capex}{f,t}\times s{f,t}^{AI}\big), \quad \text{T4Q}=\sum_{q=t-3}^{t}\text{A1}_{q}
Quality control: reconcile with cash flow statements. Track inclusion of capital leases. Maintain a mapping table from reported segments to AI attribution.
A2. Accelerator shipments (units per quarter)
Computation: vendor units shipped by model family. Keep a harmonised model taxonomy.
Quality control: cross-check vendor totals against foundry and OSAT disclosures.
A3. Installed accelerator compute (PFLOPS-equivalent)
Computation: cumulative shipments times per-unit peak FLOPS times a utilisation factor u.
\text{A3}{t}=\sum{q \le t}\sum_{m} \text{Units}{m,q}\cdot \text{FLOPS}^{\text{peak}}{m}\cdot u_{m,q}
Document the utilisation assumption and provide a sensitivity band, for example u \in [0.35,0.65].
A4. Data-centre capacity under construction and energised (MW)
Computation: sum project-level MW by status. Separate under construction and energised.
Quality control: reconcile developer claims with utility interconnection queues.
A5. Grid connection lead time (months)
Computation: median T_{\text{energise}}-T_{\text{apply}} for new loads above threshold P_{0}. Publish P_{0} and changes to queue rules.
A6. Inference cost per task (US$)
Define a fixed task card per use case. Include prompt, context length, target quality, latency band, hardware profile, and guardrails.
Computation:
\text{Cost}=\text{Tokens}\times p_{\text{in}} + \text{Tokens}{\text{out}}\times p{\text{out}} + \text{Guardrail\,calls}\times p_{g} + \text{Vector\,reads}\times p_{r}
Report median and interquartile range across 100 runs. Fix the random seed and temperature.
A7. Latency distribution (ms, p50 and p95)
Computation: synthetic monitors hit the task card endpoint every five minutes for seven days per region. Report p50 and p95 excluding cold-start runs unless declared.
2.2 Layer B. Tooling and developer usage
B1. Active coding-assistant seats (FTE-normalised)
\text{B1}{t}=\frac{\text{Monthly active seats}{t}}{\text{Software FTE}_{t}}
Quality adjustment: remove seats with fewer than five accepted suggestions in the month.
B2. Acceptance rate of AI suggestions (%)
\text{B2}_{t}=\frac{\text{Accepted tokens}}{\text{Suggested tokens}}\times 100
Quality adjustment: compute a test-weighted rate where failed unit or integration tests down-weight accepted tokens.
B3. API token consumption (tokens, T30D)
Aggregate billed tokens per application ID over trailing 30 days. Publish context-length distribution and cache hit rates.
B4. Unique production applications (count)
Definition: services with service level objectives and change management in scope. Exclude sandboxes.
B5. Fine-tune and RAG operations
Count completed fine-tunes. For retrieval, log queries per second and hit rate on an index with documented freshness.
B6. Eval pass rate on curated suite (%)
Fix the evaluation battery and acceptance thresholds. Version every eval set.
\text{B6}_{t}=\frac{\text{Tests passed}}{\text{Total tests}}\times 100
2.3 Layer C. Enterprise productionisation
C1. Workflows in production with approved model-risk governance (%)
Denominator: material workflows in production. Numerator: those using AI with signed-off model-risk governance.
\text{C1}_{t}=\frac{\#\{\text{Prod workflows with MRG}\}}{\#\{\text{Prod material workflows}\}}\times 100
Attach the approval artefact ID for audit.
C2. Independently audited models (%)
Count production models with an independent audit in the prior 12 months. Record audit scope and standard.
C3. Pilot-to-production lead time (days, median)
Track calendar days from pilot approval to go-live. Report median and interquartile range. Winsorise at the 99th percentile.
C4. AI-attributable EBIT share (%)
Use a documented counterfactual. Prefer randomised controlled trials or stepped-wedge rollouts.
Incremental EBIT:
\Delta \text{EBIT}=\left(\text{Revenue}-\text{COGS}-\text{OpEx}\right){\text{AI}}-\left(\text{Revenue}-\text{COGS}-\text{OpEx}\right){\text{control}}
Share:
\text{C4}{t}=\frac{\Delta \text{EBIT}}{\text{EBIT}{\text{business\,unit}}}\times 100
Disclose holdout design, sample sizes, and pre-trend checks.
C5. AI feature ARPU uplift (US$ per user)
\Delta \text{ARPU}=\text{ARPU}{\text{treatment}}-\text{ARPU}{\text{control}}
Use customer-level regressions to separate price from mix effects.
C6. Automated resolution with QA pass (%)
\text{C6}_{t}=\frac{\#\{\text{tickets resolved by AI and QA-passed}\}}{\#\{\text{tickets resolved by AI}\}}\times 100
Track re-open rates and escalations within 14 days.
C7. Safety and compliance incidents (per 1k requests)
\text{Rate}=\frac{\text{Confirmed incidents}}{\text{Requests}}\times 1000
Classify by severity and root cause. Publish incident narratives where possible.
2.4 Layer D. Outcomes
D1. Labour-productivity gap (index points)
Construct sector indexes from official statistics. Rebase to a common base year b.
Rebasing within sector i:
y_{it}^{(b)}=\frac{y_{it}}{y_{ib}}\times 100
Exposure cohorts: split sectors by employment-weighted exposure metric. Gap:
\text{Gap}{t}=\bar{y}^{\text{high}}{t}-\bar{y}^{\text{low}}_{t}
Publish 95 percent confidence intervals based on sectoral dispersion.
D2. TFP differential (index points)
Apply the same rebasing and cohort split to TFP series. Document labour quality and capital input methods.
D3. Wage dispersion by exposure (percentage points)
Difference in wage growth between exposure deciles. Control for composition with Oaxaca–Blinder or reweighting.
D4. Occupational churn in high-exposure roles (%)
\text{Churn}_{t}=\frac{\text{Exits}+\text{Entries}}{\text{Employment}}\times 100
Adjust for classification breaks.
D5. Quality and safety metrics in AI-supported services
Define sector-specific metrics ex-ante. For example, adverse event rate per 10,000 decisions.
2.5 Attribution and identification templates
A. EBIT attribution template (C4)
Unit of analysis: business unit × month.
Design: randomised rollout or stepped wedge.
Model: difference-in-differences with unit and time fixed effects, standard errors clustered by unit.
Report: effect size, standard error, pre-trend test, power analysis.
B. Productivity DiD and event study (D1)
Treatment: continuous exposure score. Post indicator from 2022 onward.
Model:
y_{it}=\alpha_{i}+\gamma_{t}+\beta\big(\text{Exposure}{i}\cdot \mathbf{1}\{t\ge 2022\}\big)+\epsilon{it}
Event study: exposure × year dummies with 2019 baseline.
Report: coefficient paths with clustered confidence intervals, pre-trend diagnostics.
C. Inference cost task card (A6)
Inputs: prompt, context length, target quality, latency band, hardware profile, guardrails.
Outputs: cost per task, p50 and p95 latency, error bars across 100 runs.
Version every change in model or prompt.
2.6 Quality control checklist
Cross-source reconciliation for capex and accelerator shipments.
Reproducible scripts for rebasing and cohort formation.
Sensitivity to alternative exposure metrics and employment weights.
Clear flags for interpolations and revisions.
Independent re-estimation before publication.
-
This appendix documents estimation choices, diagnostics, and decision rules for the Diffusion Dashboard. It is designed for replication and for consistent interpretation across releases.
3.1 Stacked diffusion curves (Layer A to D)
Specification. For sector i and layer L \in \{A,B,C,D\},
x_{iL}(t)=\frac{K_{iL}}{1+\exp\!\big[-r_{iL}\,(t-\tau_{iL})\big]}+\varepsilon_{iL}(t),
where K_{iL} is the saturation level, r_{iL} the slope, \tau_{iL} the midpoint, and \varepsilon_{iL}(t)\sim\mathcal{N}(0,\sigma_{L}^{2}).
Pooling across sectors. Use a hierarchical prior to borrow strength:
(K_{iL}, r_{iL}, \tau_{iL})^{\top} \sim \mathcal{N}\!\big(\mu_{L}, \Sigma_{L}\big),
with weakly informative hyperpriors on \mu_{L} and \Sigma_{L} (e.g., \mu_{L}\sim\mathcal{N}(m_{0},S_{0}), \Sigma_{L}\sim LKJ prior with scale 2 and marginal half normal 0.5 to 2.0 for standard deviations). Constrain K_{iL}>0 and r_{iL}>0 with half normal priors. Constrain \tau_{iL} to the observed time span plus a one period buffer.
Estimation. Fit with Hamiltonian Monte Carlo. Four chains, 2,000 iterations, 1,000 warmup. Convergence requires \hat{R} < 1.01 and effective sample size above 400 per parameter. For high frequency updates a variational fit can be used as a nowcast, with HMC rerun in the quarterly release.
Model selection and fit. Compare single logistic vs piecewise logistic using leave one out cross validation or WAIC. Report posterior predictive checks: coverage of empirical quantiles, residual autocorrelation, and the implied saturation K_{iL} relative to engineering or regulatory constraints.
3.2 Layer linkages and lags
Distributed lag structure. Link layers with interpretable delays:
B_{it}=\alpha_{B}+\sum_{\ell=0}^{L_{B}}\beta_{B\ell}\,A_{i,t-\ell}+u_{Bit}, \qquad C_{it}=\alpha_{C}+\sum_{\ell=0}^{L_{C}}\beta_{C\ell}\,B_{i,t-\ell}+u_{Cit},
D_{it}=\alpha_{D}+\sum_{\ell=0}^{L_{D}}\beta_{D\ell}\,C_{i,t-\ell}+\gamma^{\top}Z_{it}+u_{Dit}.
Include sector fixed effects and time fixed effects in each equation to absorb level differences and common shocks. Cluster standard errors by sector. Choose L_{B}, L_{C}, L_{D} using information criteria and residual diagnostics. If serial correlation remains after including lags use Newey West standard errors with a lag length selected by the Andrews rule.
Endogeneity and simultaneity. If there is evidence that a downstream layer feeds back into an upstream layer inside the sampling interval, use an instrumental variables approach for the affected regressors. Candidate instruments include engineering or policy constraints that shift upstream layers but do not directly affect downstream outcomes within the period, for example connection queue reforms for A4 or regulated price changes for A6. Test instrument strength and over identification where applicable.
Cross sectional dependence. If sectors share common shocks beyond time fixed effects, report Driscoll and Kraay standard errors or use a common correlated effects estimator as a robustness check.
3.3 Event study and exposure thresholds
Dynamic exposure design. For outcomes y_{it} such as labour productivity, estimate
y_{it}=\alpha_{i}+\gamma_{t}+\sum_{\tau\neq t_{0}}\delta_{\tau}\,\big(\text{Exposure}{i}\cdot \mathbf{1}\{t=\tau\}\big)+\epsilon{it},
with t_{0} as the baseline year, typically 2019. Plot \delta_{\tau} with clustered confidence intervals. Pre trend validity requires all pre t_{0} coefficients near zero and jointly insignificant.
Continuous versus binned exposure. Continuous exposure retains power and avoids arbitrary cutoffs. For communication binned high and low exposure groups can be shown, but inference should rely on the continuous specification.
3.4 Identification, data handling, and transformations
Fixed definitions. All series have versioned unit definitions, denominators, and base years. Sectoral outcomes are rebased within sector to a common base b:
y_{it}^{(b)}=\frac{y_{it}}{y_{ib}}\times 100.
Missing data and revisions. Interpolate across gaps of at most two quarters for display and flag them. Do not extrapolate beyond the sample. Retain point in time vintages for any revised official statistics and provide a reconciliation table in the repository.
Winsorisation and outliers. Winsorise growth rates at the 1st and 99th percentile to guard against spurious variation from small denominators. Do not winsorise levels.
3.5 Decision thresholds and alerts
Internal deployment threshold. Declare a sector as “entering deployment” when both conditions hold in the same rolling year:
at least 30 percent of material workflows are in production with signed off model risk governance
AI attributable EBIT share exceeds 5 percent with a documented counterfactual and a p value below 0.10 or a 90 percent credible interval that excludes zero.
Macro signal threshold. Declare an economy level AI effect when the high minus low exposure productivity gap widens by at least 1.5 index points for two consecutive years and the difference in the event study is significant at 5 percent after clustering, or the Bayesian posterior probability of a positive gap exceeds 0.95.
Latency and cost thresholds. For each task card define a maximum p95 latency and a target cost per task that would clear an ROI hurdle rate. Trigger an alert when both are met for two consecutive monthly observations.
3.6 Diagnostics and robustness
Diffusion curves. Check \hat{R} and effective sample sizes. Compare implied K_{iL} to physical or regulatory constraints. Inspect posterior predictive bands against withheld periods.
Distributed lags. Examine residual autocorrelation functions and partial autocorrelation functions. Re estimate with one additional lag to confirm stability. Test sensitivity to alternative lag windows and to exclusion of outlier quarters.
Event study. Report joint tests for pre period coefficients. Use wild cluster bootstrap p values when the number of clusters is small. Provide a figure with alternative baseline years as a sensitivity.
Alternative taxonomies. Re run the exposure analyses with at least one alternative measure, for example complementarity adjusted exposure or a task level index, and report the sign and magnitude relative to the baseline.
Cross validation. Where possible split sectors by random halves and verify that estimated lags and slopes are consistent across splits. For outcome equations report leave one sector out fits.
3.7 Implementation notes
Software. Use a reproducible toolchain with a lockfile. Bayesian components can be implemented in Stan or PyMC. Panel regressions can be estimated in R or Python with fixed effects and clustered errors. Maintain unit tests for rebasing, cohort formation, and attribution functions.
Reproducibility. Each quarterly release includes a tag that fixes source URLs, transformation scripts, and parameter choices. Figures are regenerated from raw inputs through a single build script. The repository includes a change log that records any definition or source change along with a rationale.
Ethics and access. No reliance on opaque proprietary endpoints without a parallel open measurement. Any confidential telemetry used for firm panels must be aggregated and anonymised before publication and must pass disclosure control checks.
Summary. The model stack combines sector level diffusion, lagged propagation from infrastructure to outcomes, and causal panels for outcomes. Estimation is transparent, diagnostics are routine, and decision thresholds are explicit. Together these choices give a disciplined and auditable way to conclude when installation has become deployment and where the binding constraints sit.
-
This section parameterises three forward paths for AI diffusion—Base, Fast, and Stall—and links them to observable indicators in Layers A–D. Each scenario assigns values (or priors) to saturation levels, slopes and midpoints of the stacked diffusion curves, to lag structures between layers, and to exogenous shock variables (energy, regulation, macro). The goal is to turn qualitative narratives into testable, updatable forecasts.
4.1 Scenario parameters (summary)
Component
Base
Fast
Stall
A: Infrastructure (K_A, r_A, τ_A)
K_A: +150–200% vs 2024 by 2030; r_A: 0.9–1.1; τ_A: 2026
K_A: +250–300%; r_A: 1.3–1.6; τ_A: 2025
K_A: +60–90%; r_A: 0.4–0.6; τ_A: 2027–2028
A6: Inference cost decline
−3× by 2027 (compound −25%/yr)
−10× by 2026 (compound −45%/yr)
−1.5× by 2029
A7: Latency p95 (task-card)
≤ 800 ms by 2027
≤ 400 ms by 2026
≥ 1200 ms through 2029
B: Tooling adoption (K_B, r_B, τ_B)
K_B: 60–70% of eligible devs; r_B: 1.0; τ_B: 2026
K_B: 80–90%; r_B: 1.5; τ_B: 2025
K_B: 35–45%; r_B: 0.5; τ_B: 2028
C1: Workflows in prod w/ MRG
20–30% by 2028
40–50% by 2027
10–15% by 2030
C4: AI-attributable EBIT share (median firm)
≥ 5% by 2028
≥ 5% by 2026, ≥ 10% by 2028
≤ 3% through 2030
Lags (A→B, B→C, C→D)
2–3 qtrs; 2–4 qtrs; 6–8 qtrs
1–2; 1–2; 4–6
3–5; 4–6; 8–12
D: Macro signal (high–low exposure gap)
+1.5–2.0 index pts for ≥2 yrs by 2029
+2.5–3.5 pts by 2027
≤ 1.0 pt through 2030
Exogenous constraints
Energy tight but manageable; audit standards converge by 2027
Accelerated grid adds; accepted audit stack by 2026
Binding grid delays; liability unresolved through 2028
Notes: K = saturation level; r = slope; τ = midpoint of the logistic curve for the indicated layer; lags measured in quarters.
4.2 Narrative and decision thresholds
Base. Infrastructure scales steadily; inference costs fall ~3× by 2027; evaluation/guardrail stacks converge slowly. Tooling adoption grows but productionisation is constrained by governance and data quality. Decision threshold hit when (i) ≥30% of material workflows are in production with approved MRG and (ii) median business unit attributes ≥5% EBIT to AI with a documented counterfactual. Macro signal appears late decade as the exposure-sorted productivity gap exceeds 1.5 index points for two consecutive years with statistical significance.
Fast. A further order-of-magnitude drop in inference costs by 2026 plus a widely accepted audit stack moves firms across internal ROI and liability thresholds. Productionised use exceeds 40% in leading sectors; EBIT contributions surpass 10% in the upper quartile. Macro differentials widen by 2027, consistent with shorter B→C→D lags.
Stall. Grid connections and water constraints bind; unit economics improve slowly; liability and IP uncertainty persist. Pilots expand, but productionisation stalls below 15% of material workflows; the exposure-sorted productivity gap remains statistically indistinguishable from zero.
4.3 How the scenarios update
Bayesian updating. Each quarter, refresh priors on (K, r, τ) using the latest Layer A–C observations; re-estimate lags with distributed-lag regressions including time fixed effects and clustered errors.
Trigger rules.
Upgrade Base→Fast if A6 (cost) and A7 (latency) meet target thresholds for two consecutive quarters and C1 crosses 30% with a rising C4 (≥5%).
Downgrade Base→Stall if A4/A5 (power MW/lead times) remain adverse for three quarters and B6 eval pass rates fail to improve despite rising B1–B4 usage.
4.4 Outputs to report for each scenario
Time-to-thresholds: quarters to (i) 30% productionised workflows, (ii) ≥5% AI-attributable EBIT, (iii) macro productivity gap ≥1.5 pts for two years.
Elasticities: estimated ∂C/∂B and ∂D/∂C at current points on the curves, with clustered CIs.
Sensitivity bands: scenario ribbons under ±1σ shocks to energy prices, regulatory milestones, and capex availability.
Policy levers: for Base and Stall, identify the marginal lever with the highest probability of moving the system across the deployment threshold (e.g., grid acceleration vs. audit standardisation).
4.5 Implementation checklist
Fix the task-card definitions for A6/A7; archive every version.
Maintain a harmonised vendor model taxonomy for A2/A3; publish mapping tables.
Keep C-layer governance metrics auditable (store approval artefact IDs).
Publish code and priors for scenario fits; include posterior predictive checks and out-of-sample tests.
-
This section sets the release rhythm, artefacts, versioning, and quality controls for the Diffusion Dashboard so results are reproducible and comparable through time.
5.1 Cadence and calendar
Quarterly main release (DD-YYYY-Qn): Core update of Layers A–D, model re-estimation, scenarios, and policy implications.
Timing: publish T+45 days after quarter-end (e.g., Q2 closes 30 June → release by 15 August).
Monthly indicators brief (MI-YYYY-MM): High-frequency moves in A6/A7 (cost/latency), B1–B6 (tooling), select C-layer metrics, with no model re-fit.
Ad hoc technical notes (TN-YYYY-NN): Methods changes, data revisions, or sector deep dives.
Embargo/time zone: coordinate to 00:00 AEST on publication day; provide UTC equivalents in metadata.
5.2 Deliverables per quarterly release
Executive dashboard (PDF, ≤10 pages): headline movements in A–D, thresholds crossed, and scenario nowcast with uncertainty bands.
Statistical appendix (PDF): model specification, priors, diagnostics (convergence, PPCs), robustness tables, and full event-study figures.
Policy brief (PDF, 3–5 pages): actionable levers conditional on identified bottlenecks (e.g., grid, auditability, procurement).
Data package (open):
dd_YYYY_Qn_indicators.csv — wide panel of all indicators with units, coverage, and flags.
dd_YYYY_Qn_metadata.json — variable dictionary (code, name, unit, source URL, transform script, refresh frequency).
dd_YYYY_Qn_models.parquet — parameter draws/posteriors for diffusion and lag models.
dd_YYYY_Qn_figures.zip — all charts (PNG + SVG), publication resolution, with embedded captions.
Repository tag: Git commit hash and release tag DD-YYYY-Qn; all scripts to regenerate figures from raw inputs.
5.3 Change control and versioning
Semantic tagging: MAJOR.MINOR.PATCH (e.g., 1.3.0) where
MAJOR = definition changes to an indicator or model class;
MINOR = new indicators/figures added;
PATCH = data corrections that do not alter definitions.
Change log: succinct entry per change (who, what, why, impact).
Supersedures: each dataset row carries vintage_date; revised values never overwrite prior vintages.
5.4 Quality assurance (pre-release)
Data integrity: schema checks (types/units), range checks, missing-value policy, duplicate ID detection.
Reconciliation: capex and accelerator totals cross-checked against firm filings and shipment trackers; energy MW reconciled with interconnection queues.
Model diagnostics: \hat{R} < 1.01, effective sample size thresholds, posterior predictive checks, residual autocorrelation tests, wild-cluster bootstrap for small-cluster panels.
Sensitivity: alternative exposure metric, alternative lag windows, and winsorisation settings; report stability of key coefficients.
Independent rerun: second analyst re-generates the release from raw inputs on a clean environment; hashes of outputs must match.
5.5 Documentation standards
Indicator dictionary: maintained as a single source of truth; every figure cites indicator codes (e.g., A6, C1).
Methods note: any change to definitions, priors, or identification is documented with rationale and an ablation table.
Attribution cards: for C4 (AI-attributable EBIT) and A6/A7 (task cards), include versioned templates and counterfactual design notes.
5.6 Access and archiving
Open access bundle: executive dashboard, figures, and indicators CSV + metadata JSON.
Controlled access (if needed): firm-level panels anonymised/aggregated and shared under data-use terms.
Archival ID: mint a DOI per quarterly release (e.g., Zenodo); store all artefacts and code tag under the DOI.
5.7 Communications
Release note (≤400 words): what moved, what crossed thresholds, and how scenarios shifted.
Methods summary (≤250 words): specification, diagnostics, and any definition changes.
Media chart pack: three figures with captions suitable for syndication (PNG + SVG).
5.8 Service levels
Errata window: investigate reported data errors within 5 business days; patch within 10 with a PATCH release and updated DOI landing page.
Response to methodological queries: within 10 business days, with a link to reproducible code.
-
This section explains how to read the dashboard, distinguish signal from noise, and act on explicit alerts. It aligns indicator movements with the installation→deployment pathway defined earlier.
6.1 Reading the four layers
Layer A (infrastructure).
Rising A1–A5 indicate capacity build-out; A6 (cost) and A7 (latency) convert capacity into usable performance. Sustained declines in A6 plus stable/improving A7 are necessary (not sufficient) conditions for deployment.
Layer B (tooling and developer usage).
B1–B5 track routine use; B6 (eval pass rate) is the quality gate. Deployment requires both scale (B1–B4) and quality (B6). Rising B usage without B6 improvement signals experimentation rather than readiness.
Layer C (enterprise productionisation).
C1 (MRG coverage) and C4 (AI-attributable EBIT) are the canonical deployment metrics. C2 (independent audit) and C6–C7 (QA pass, incidents) govern risk. Deployment is credible when C1 and C4 clear thresholds and C6–C7 remain acceptable.
Layer D (outcomes).
D1–D2 (exposure-sorted productivity/TFP), D3–D4 (wage and churn differentials), and D5 (sectoral quality/safety metrics) translate firm deployment into macro signals. Treat D as the lagging confirmation layer.
6.2 Signal vs noise: practical tests
Threshold persistence. Require any threshold crossing to persist for two consecutive releases (monthly for B/A6–A7; quarterly for C/D).
Triangulation. Accept a signal only if at least two adjacent layers move in the logical direction (e.g., A6↓ & B6↑, then C1↑).
Revision sensitivity. Recompute key ratios using the previous vintage to ensure the signal survives routine revisions.
Composition checks. For B and C, verify whether gains are driven by a single unit/product; report concentration (top-3 share).
Denominator effects. For C4 (EBIT share), run the level effect (ΔEBIT) alongside the ratio to avoid artefacts from shrinking denominators.
6.3 Colour states and concise rules
Use a three-colour scheme per layer and for the system:
Green (on track): thresholds met and confirmed; quality/risk metrics stable.
Amber (installing/transition): capacity and usage rising but quality or governance lag.
Red (stall/risk): cost/latency deteriorating, governance coverage falling, or safety incidents rising.
6.4 Alert definitions (implemented as boolean rules)
A. Deployment alert (sector-level, quarterly)
Trigger GREEN when, in the same rolling year:
C1 ≥ 30% (workflows in production with approved MRG), and
C4 ≥ 5% (AI-attributable EBIT share with documented counterfactual), and
C6 ≥ 95% QA pass and C7 ≤ 0.5/1k requests (severity-adjusted).
Persistently AMBER if only one of C1 or C4 is met; RED if C1 falls >5 pp or C7 spikes above 1/1k for two quarters.
B. Readiness alert (firm or sector, monthly/quarterly)
Trigger AMBER→GREEN when A6 (cost) meets the pre-declared ROI threshold and B6 exceeds the acceptance bar for two consecutive months, with A7 (latency p95) within the SLO. This signals “ready to scale pilots”.
C. Macro signal alert (country, annual)
Trigger GREEN when D1 (high–low exposure labour-productivity gap) ≥ 1.5 index points for two consecutive years with clustered 95% CIs excluding zero, or Bayesian posterior P(\text{gap}>0) \ge 0.95. AMBER if one year meets the criterion; RED if the gap narrows or becomes insignificant.
D. Governance risk alert (firm/sector, quarterly)
Trigger RED when C2 (independent audits) falls below 20% of production models or C7 doubles relative to its four-quarter trailing mean. Freeze new deployments until remediation.
E. Energy/compute bottleneck alert (region, quarterly)
Trigger RED when A4 (energised MW) flatlines while A5 (grid lead time) rises ≥ 6 months and A6 fails to improve. Policy focus: connection queues, siting, water use.
6.5 Decision tree (managerial and policy actions)
A↑; A6↓; B flat; C/D flat → “Install without deploy.”
Action (firm): invest in evaluation harnesses (B6), harden MRG (C1), prioritise 2–3 use cases with measured ROI.
Action (policy): standardise audit/incident reporting; reduce fixed costs of governance.
A stable; B↑ & B6↑; C rising; D flat → “Crossing the internal threshold.”
Action (firm): scale the few high-ROI workflows; negotiate long-term inference pricing; formalise incident response.
Action (policy): targeted procurement templates; sandbox programmes; skills grants tied to workflow redesign.
C↑; D↑ with significance → “Deployment recognised.”
Action (firm): expand to adjacent workflows; deepen data quality and lineage; monitor concentration risk.
Action (policy): competition/data-portability guardrails; energy planning for sustained demand.
B↑; C7 incidents↑; C1 not rising → “Quality/risk gap.”
Action (firm): pause expansion; increase eval coverage; require independent audits (C2).
Action (policy): audit standards and liability allocation to reduce uncertainty.
6.6 Worked examples (hypothetical, quarterly)
Quarter T: A6 falls 35%, A7 meets SLO; B1 +12 pp, B6 from 78→91%. Interpretation: readiness achieved; issue Readiness alert (GREEN).
Quarter T+2: C1 hits 32%, C4 reaches 5.4%, C6 at 96%, C7 = 0.3/1k. Interpretation: trigger Deployment alert (GREEN).
Year T+2: D1 gap averages 1.7 pts with clustered 95% CI excluding zero. Interpretation: first year toward Macro signal; confirm in T+3.
6.7 Common pitfalls (and how we guard against them)
Pilot inflation: B1–B4 up with no C1/C4 movement. Guard: insist on MRG sign-off and counterfactual attribution.
Denominator artefacts in C4: EBIT share rises due to falling EBIT. Guard: publish ΔEBIT alongside share; require holdouts.
One-off shocks masquerading as trends: temporary price cuts lower A6. Guard: require two consecutive periods and triangulation with B6/C1.
Data drift and leakage in B6: eval pass rates climb due to test overfitting. Guard: versioned eval sets; leakage checks.
6.8 Publication cues
Each release should include a one-page “What changed and why it matters” that:
States which alerts fired or cleared;
Names the binding bottleneck layer (A, B, or C) where relevant;
Quantifies the shift in scenario probabilities (Base/Fast/Stall) with a one-paragraph policy/management implication.