AI’s Multi‑Trillion‑Dollar Capex Bet: Will the Numbers Add Up?

by barnaby
12 minutes read

Executive summary / TLDR

  • 2025 hyperscaler CapEx is set to surpass the inflation‑adjusted cost of the entire Apollo programme – a fresh record for corporate investment in a single technology trend.
  • Returns are uneven: while Microsoft, Amazon, Alphabet and Meta now spend around 50 % of their operating cashflow on chips and data‑centres, Nvidia remains the primary near‑term beneficiary.
  • The compute mix is shifting from large, up‑front pre‑training to lower‑cost post‑training fine‑tuning and usage‑based test‑time reasoning.
  • Open‑source breakthroughs – notably DeepSeek R‑1 – demonstrate frontier‑level performance at a fraction of historical budgets.
  • China is accelerating domestic silicon and model ecosystems following US export controls, potentially driving down silicon costs.
  • Agentic and Physical AI – robotics, autonomous vehicles, industrial digital twins – will make bursty, inference‑heavy workloads the norm by the end of the decade.
  • For investors and corporates, value is migrating towards electricity, memory bandwidth, optical interconnect, data ownership and low‑latency edge deployments.

The scale of the wager

In calendar‑year 2025 the five largest hyperscalers will invest around than US $280 billion in data‑centre construction, GPUs, memory and power contracts¹, with aggregate spending in the period 2025 – 2027 expected to pass US $1 trillion. 

“I have never seen such a “cap ex first” innovation wave. By spending the cap-ex all up front before we even know what we are building, the risk is increased quite dramatically.” – Bill Gurley, General Partner at Benchmark, Post on X, 13 July 2024

Adjusted for inflation the projected 2025 spend alone is more than the thirteen year Apollo Program in a single year -  and almost half of the Hyperscaler’s projected operating cash‑flow.

Source: Finchat.io Note: 2024 – 2026 based on analyst projections; where companies have different financial period ends, we have sought to align with calendar year. E.g. Nvidia period ending January 2025 counted as 2024

Yet only one clear winner has emerged so far: Nvidia, whose data‑centre revenue now rivals the next three semiconductor suppliers combined.

Source: Finchat.io Note: 2024 – 2026 based on analyst projections; where companies have different financial period ends, we have sought to align with calendar year. E.g. Nvidia period ending January 2025 counted as 2024

Hyperscaler Boards therefore face a familiar strategic dilemma: is this the Internet bubble, where early capex overshot demand, or the cloud wave, where capacity was eventually absorbed and richly monetised?  Worse, each firm fears that under‑spending cedes a permanent AGI lead to a rival. The result is a high‑stakes “prisoner’s dilemma” in silicon.


A simple check‑up before another billion

A simple back‑of‑envelope Return‑on‑Compute (ROC) check‑up would be something like the below:

NPV ≈ Σ [(Price per token – Variable cost per token) × Tokens served × Utilisation] – (CapEx + Fixed OpEx) 

If the left‑hand side stays positive after realistic sensitivity tests, the project clears the investment hurdle. If not, capital is at risk of being stranded in a fast‑depreciating asset class — GPUs typically amortise in 18‑24 months; some high‑bandwidth‑memory (HBM) cards in just twelve.

An overview of some of the critical drivers for each of the ‘swim lanes’ of this formula is set out in the table below

ROC Swim‑Lanes  (critical drivers only)

Swim‑Lane (ROC Term)What Really Moves the Needle (2025‑27)
1. Price per Token(a) Closed vs. open‑source quality gap
(b) Monetisation model — Seat licences / API calls / revenue‑share
(c) Competitive discounting & hyperscaler bundles
(d) Enterprise shift to on‑prem / private deployments
2. Variable Cost per Token(a) Electricity $/kWh
(b) GPU & HBM depreciation schedule
(c) Foundation‑model royalty / rev‑share
(d) Data‑centre networking costs
3. Tokens Served(a) Adoption of agentic copilot workflows
(b) Average tokens per query (reasoning chains vs. single‑shot)
(c) Physical‑AI workloads (robots, AVs, industrial IoT)
4. Utilisation Rate(a) How often GPUs sit idle vs billed
(b) Bottlenecks (memory, networks, energy etc)
(c) Edge/off‑prem inference off‑loading
5. Fixed Outlays (CapEx + OpEx)(a) GPU ASP trends (NVIDIA vs. AMD vs. China entrants
(b) Data‑centre build cost per MW (land, cooling, fit‑out)
(c) Long‑term power lock-in contracts (e.g., SMR nuclear PPAs)
(d) Staff & support overhead; software licensing

The rest of this article explores six of the major variables that can swing the ROC calculation.


1. Where are we on the compute S‑curve?

For fifteen years GPUs followed a classic first S‑curve: every additional dollar of pre‑training compute delivered impressively higher model quality. The proposed xAI Colossus (100k Nvidia GPUs requiring 300 MW of power) and OpenAI ‘Stargate’ project (500k Nvidia GPUs) assumes this exponential trajectory holds.

But evidence from late‑2024 runs suggests marginal gains from brute‑force pre‑training are flattening / diminishing. Many labs are now redirecting spend into the next two overlapping, higher S‑curves: post-training and test-time scaling.

Compute scaling phase >1. Pre‑training2. Post‑training (fine tuning)3. Test‑time (inference)
Primary spendMassive GPU clustersGPU + High Bandwidth Memory (HBM), human‑in‑loopInference accelerators, memory bandwidth
Bottleneck (#1)GPU supplyData‑labour for human feedbackLatency & memory (context window etc)
Bottleneck (#2)Access to fresh dataFast, low‑cost fine‑tune siliconEnergy cost per token

In January 2025, DeepSeek-R1, a Chinese Large Language Model (LLM) was recognised as one of the leading reasoning models, surpassing many US models. The development team had achieved this at low CapEx spend through a focus on post-training, which included Reinforcement Learning with Human Feedback (RLHF).

“We are riding multiple compounding S curves in pre-training, inference time, and systems design, driving model performance that is doubling every 6 months.” — Satya Nadella, CEO Microsoft, Post on X 1 May 2025

Strategic implication: As the era of massive pre-training subsides, ongoing inference for tasks like agents or robotics leads to distributed HPC. Enterprises will also use on-prem solutions to keep data local and reuse idle GPUs for inference workloads. With this compute shift, the critical factor will move more towards high bandwidth memory – i.e. the ability to field longer ‘context windows’ at speed.


2. How exactly will these AI capabilities make money?

It’s one thing to have a jaw-dropping demo; it’s another to have users or enterprises pay for it sustainably. Several models are being tried:

  • cloud usage fees (e.g. pay per 1,000 API calls)
  • SaaS-style subscriptions (e.g. ChatGPT Plus at $20/month, Microsoft 365 Copilot at $30/user), and
  • indirect monetization (more engagement leading to more ad revenue)

The unit economics of AI services can be challenging – serving one AI query can cost 10× or 100× what a traditional software query does, meaning prices need to be high or costs need to come down. This cost is increasing with the latest ‘reasoning’ models.

At the same time older model token floor pricing rapidly declines as new models are introduced. As of May 2025, hyperscaler wholesale rates for GPT‑4 Turbo hover around $5–7 / M‑tokens in committed enterprise deals, down from $30 just a year earlier.

Strategic implication: A big open question is whether end-users will pay directly for AI, or if it will mostly be an embedded feature that companies absorb the cost of (and try to recoup via higher productivity or retention). In enterprises, there’s willingness to pay for tangible productivity gains (hence all the AI copilots targeting coding, document generation, etc.). In consumer, it’s trickier – few consumers pay for search or email now; will they be willing to pay for an AI assistant? Open AI’s success in growing paid subscriptions suggest some will, but loyalty is fickle – consumers quickly migrate to the new ‘best’ model.


3. Open source squeezes token prices

Will open-source models and tools dominate the landscape, or will proprietary offerings maintain an edge? This debate is heated.

When Meta open‑sourced Llama 3, few foresaw how quickly community fine‑tunes would close the quality gap with closed-source models such as Open AI’s ChatGPT. January 2025’s DeepSeek‑R1 went further: frontier‑level accuracy on a cluster costing roughly a tenth of GPT‑4’s. Xiaomi’s MiLM‑2 followed weeks later.

“Deepseeek now 23% of ChatGPT daily active users. And far more daily app downloads.” Marc Andreesen, Post on X, 2 February 2025

Cheaper, high‑quality open source models place downward pressure on price per token and potentially push value up the stack toward Apps / Software, Data owners, and Devices and Distribution moats.

Strategic implication: The perception among enterprise buyers will be critical: if companies become comfortable that open-source models are “good enough” for their needs, it could shift spend away from proprietary APIs to more on-prem AI.


4. Energy moves to centre‑stage

Silicon is no longer the only scarce input.  The International Energy Agency (IEA) projects) data center electriicity consumption to more than double to around 945 TWh by 2030, with more than half of such growth in the United States driven by AI.

Electricity grids are already under strain in many places. This is compounded by Data Centers being concentrated in certain locations – i.e. near large population centers. Wait times for critical grid components are extending. Many, including Nvidia’s Jensen Huang are arguing that in the future the key bottleneck will be power.

“Energy Abundance sparks Economic Abundance” – Scott Bessent, US Treasury Secretary

To mitigate these risks, operators have been looking to new energy sources – with some signing two‑decade nuclear, including small‑modular‑reactor (SMR), power purchase agreements to secure 24/7 baseload at predictable cost. For more on this, please refer to our first future forward article on the Electricity Economy.

Strategic implication: For the ROC equation that means Variable_cost_token may soon be driven as much by $/kWh as by chip amortisation.


5. Demand: Agentic and Physical AI multiply tokens

Many believe Agentic AI is the next frontier. We already see experimentation with systems where multiple specialized AI agents communicate via API calls into multiple software apps to collaborate to handle complex tasks (one agent might be good at math, another at coding, another at planning – together they solve a problem). This could be more efficient than one monolithic model trying to do everything. If multi-agent approaches prove effective, the infrastructure might shift to orchestrating many smaller models. That would change the profile of compute (more distributed, possibly more memory and communication heavy).

Related, inference demand is not linear.  Early chatbots averaged 1‑2 k tokens per call; multi‑step agentic tasks easily consume 50 k+ tokens, while a single autonomous‑vehicle fleet can generate terabytes of daily inference. That swings Tokens_served and Utilisation sharply higher.

“It started with perception AI — understanding images, words and sounds. Then generative AI — creating text, images and sound,” Huang said. Now, we’re entering the era of “physical AI, AI that can proceed, reason, plan and act.” – Jensen Huang, CES keynote (January  2025)

Strategic implication: Rapid Adoption of Agentic AI and Physical AI could exponentially increase demand for inference, however in relation to Physical AI this will need to take place at the Edge rather than in the Cloud.


6. China’s parallel supply chain

U.S. export controls on Nvidia A100/H100 GPUs, have accelerated indigenous GPU projects in China (Biren, Huawei Ascend, Tencent Zixiao, Iluvatar) and local lithography efforts. Analysts expect mid‑range, 7 nm AI accelerators at 40 % below current ASP by late‑2026 for domestic Chinese customers.

“The Acceleration and Velocity of AI in China is off the charts…. it seems to me the US has underestimated China and AI… the reality is China is going to have frontier AI, and almost all the things we do to slow them down and stop them, are backfiring on the United States”. – Brad Gerstner, Founder and CEO at Altimeter Capital

Strategic implication: A dual‑track market could compress global chip prices. In the short term this would erode past hyperscaler depreciation assumptions. In the longer term this could make AI Compute less capital intensive, boosting return on capital and model access in cost‑sensitive areas.


Looking forward

AI has already set a new high‑water mark for corporate investment, but history reminds us that extraordinary capex does not automatically translate into extraordinary returns.  Whether the current cycle ends up looking like the profitable cloud build‑out or the over‑built dot‑com era will hinge on how management teams navigate six inter‑locking forces that feed directly into the return‑on‑compute (ROC) equation:

#FactorLink to ROC swim‑laneWhy it now matters
1Compute S‑curves(pre‑training → post‑training → test‑time)‑ Tokens served
‑ Utilisation
Diminishing pre‑training gains force the industry to chase volume‑heavy inference and memory‑heavy reasoning, shifting demand from GPUs to HBM and smarter scheduling.
2Token economics(pricing vs. cost deflation)‑ Price per token
‑ Variable cost per token
Open‑source fine‑tunes and usage‑based business models push price/ token down; success will depend on driving cost/ token down even faster.
3Open‑source acceleration‑ Fixed OpEx (royalties)
‑ CapEx agility
Community models such as DeepSeek R‑1 prove frontier quality at a fraction of historical budgets, allowing enterprises to redeploy spend higher up the stack.
4Emerging bottlenecks(energy, HBM, latency)‑ Variable cost per token
‑ Utilisation
Power, memory bandwidth and edge latency—not GPUs—become the new choke‑points, determining how much of the installed fleet is actually sweated.
5Demand shock from Agentic & Physical AI+ Tokens servedAgents, factory robots and autonomous fleets multiply inference load, but only if end‑users perceive clear value and shoulder the bill.
6China’s parallel supply chain‑ CapEx
‑ Variable cost per token
Export controls have catalysed a domestic GPU and lithography stack that could cut accelerator ASPs by ~40 %, lowering both upfront build costs and ongoing depreciation.

Strategic takeaway: Infrastructure owners that embed a ‘return‑on‑compute’ discipline—optimising each swim‑lane before committing the next dollar—will translate today’s spend into tomorrow’s cash‑flows. Those that chase capacity for capacity’s sake risk owning stranded, fast‑depreciating assets.


Sources

The analysis above is grounded in a variety of industry reports, earnings call statements, and expert interviews, including:

  • CES Keynote 2025, by NVIDIA CEO Jenson Huang
  • Invest like the best Podcast, 6 December 2024 (with Chetan Puttangunta and Modest proposal)
  • BG2 Pod: 13 December 2024 (with Satya Nadella), 24 December 2025 (with Dylan Patel), 11 January 2025
  • Posts on X from Bill Gurley, Satya Nadella, Marc Andressen
  • Hyperscaler Analyst Projections sourced from finchat.io
  • Energy consumption assumptions sourced from https://www.iea.org

Related Articles