Casefile · AI Infrastructure

Every Cheap Token Still Has a Physical Bill

Why falling AI inference costs may increase, not reduce, demand for HBM, packaging, chemistry, power, cooling, and EDA.

Published: May 7, 2026 Model: token-value-compounding Layers tracked: 8 Cascade claims: 32 Data as of: 2026-05-07

Not investment advice. All data as of publication date. Market caps and prices change daily — verify before any trade decision.

The 6× reframe

Cost ↓50% × Utility ↑3× × Tokens/task ↑ = Volume ↑~6×Modeled mechanism

Indexed Q1 2024 = 100. Three independent curves combining multiplicatively, not additively.

Observed: Q1'24–Q1'26 run-rate Cost ↓30% YoY × adoption ~4× in 14mo × reasoning models burn 3–10× more tokens per task. Lands on 8 physical layers; 3 already binding. Sources: footnotes 3, 5, 6, 8, 9, 11.

The 8 layers absorbing it

3 binding now · 3 tightening into 2026–27 · 1 watch · 1 paid in any case

Each layer detailed with tickers, lead times, and bear-case behavior in the Physical Stack section below.

L1ComputeBinding
L2HBM MemoryBinding
L3Storage (NAND / HBF watch)Watch
L4Packaging (CoWoS / ABF)Binding
L5Specialty ChemistryTightening
L6Power (LPT · switchgear)Tightening
L7Liquid CoolingTightening
L8EDA + IP (royalty layer)Paid in any case

Bear ACaptive ASIC pivot

L2–L8 stay binding. Position rotates from NVDA to AVGO/MRVL; ARM/SNPS/CDNS unchanged.

Bear CSupply surge compresses premium

Thesis intact, bottleneck premium narrows. Trim L2/L4/L5/L7; EDA/ARM/Power unaffected.

Bear BAgent ARR collapses (falsifier)

Demand multiplier → 0. Stack collapses, constraints become surpluses. Exit physical stack.

The market has the AI cost curve backwards.

Cheaper tokens do not mean less infrastructure spend. They mean more token consumption. More agents. More reasoning steps. More persistent memory. More inference clusters. More physical throughput.

The digital price falls. The physical bill compounds.

That bill lands on eight layers: compute, HBM, storage, packaging, chemistry, power, cooling, and EDA.

Software adoption is plotted in weeks. Physical capacity is plotted in years. That is the gap.

AI is entering its electricity moment. When electricity got cheaper, humans did not use less of it. They lit factories, cities, homes, machines, and networks. Tokens are going through the same transition. Cheaper tokens do not reduce usage. They unlock new usage. That is Jevons Paradox¹ applied to inference.

Frontier output-token list prices fell sharply across 2024–2025. Anthropic, OpenAI, and Google composite API pricing dropped over 30% year-over-year.³ Cursor went from $500M ARR in late 2024 to over $2B by Q1 2026.⁵ GitHub Copilot disclosed 77,000 enterprise organizations at $2B+ ARR-equivalent.⁶ Glean, Harvey, and Cognition all reported triple-digit growth.

When unit cost falls 50% and adoption rises 3×, total volume does not grow 1.5×. It grows roughly 6× as a mechanism estimate, base case. The volume binds the physical stack.

Eight layers bind in sequence. TSMC CoWoS is expanding from ~10,000 wafer starts per month in 2023 toward 60,000+ by end-2026 — a six-fold buildout over four years.⁸ HBM is fully allocated through 2026 with 2027 substantially pre-committed; the MR-MUF underfill that lets HBM stack at all runs through two Japanese suppliers with no third qualified supplier visible.¹⁰ Storage, packaging, chemistry, power, cooling, and EDA each have their own named constraint, lead time, and qualified-supplier list below.

The thesis is falsifiable and the failure modes are distinct. Bear A is a pivot: if hyperscaler captive ASIC share crosses 60%, the surface re-routes to Broadcom and Marvell; physical constraints stay intact. Bear B is a falsifier: if enterprise agent ARR collapses below 40% YoY for two consecutive quarters, the thesis breaks. Neither has activated as of May 7, 2026.

Volume 1 mapped Japan’s material monopoly. Volume 2 mapped Korea’s HBM stack. Volume 3 mapped the chemistry beneath both. Volume 4 maps the demand mechanism that makes those constraints bind — and the eight layers across which the binding propagates.

The market prices AI as a software margin story. The binding constraint is becoming a physical capacity story.

Each token gets cheaper. Each token still has a physical bill.² The trade is the receivers — the suppliers eight layers deep that ship the physics that lets the model run.

Software deflation creates physical inflation.

The Two Trend Lines

TVC rests on two trend lines. Both are independently observable. Both are monitored daily.

Trend Line 1 — Cost per output token declining. Claude 3 Haiku output dropped over 80% year-over-year in late 2024.³ The Anthropic / OpenAI / Google composite basket fell more than 30% across 2024–2025. Gemini Flash crossed below $0.10 per million output tokens. The mechanism is hardware: each accelerator generation delivers more inference per watt, and the gain passes into pricing.⁴

Counter: hyperscaler enterprise tiers are rate-limited and contracted, so list-price decline overstates the actual buyer experience. The disclosed enterprise ARR figures are net of those caps and still triple-digit. The compounding lives in the volume.

Trend Line 2 — Utility per token rising. Cursor crossed $500M ARR in late 2024, exceeded $1B in November 2025, and reached over $2B by Q1 2026⁵ — a roughly 4× expansion in 14 months. GitHub Copilot Enterprise disclosed 77,000 organizations at $2B+ ARR-equivalent.⁶ Glean, Harvey, and Cognition all crossed triple-digit YoY growth.

Counter: reasoning models burn 10–100× more tokens per task. Utility per token may fall, but utility per task rises and tokens per task rise with it. Both effects increase the load on the physical stack.

This is not a point forecast. It is a mechanism estimate. A 50% cost decline, 3× adoption, and multi-fold tokens-per-task growth do not produce linear 1.5× demand. After rate limits, budget constraints, and deployment friction, the base case still points to multi-fold inference volume growth. We use ~6× as the monitored mechanism estimate.

Cost falls 50%, adoption rises 3×, reasoning burns more tokens per task. The “6× — not 1.5×” reframe is the entire mechanism.

Why This Re-rates the Physical Stack

Every additional token of inference is a physical event. Electrons, photons, chemicals, manufactured hardware. None scale automatically with software demand.

Large power transformers above 500 MVA carry 36–48 month lead times.⁷ CoWoS capacity expansion takes four years.⁸ HBM is allocated through 2026 with 2027 pre-sold.⁹ Software adoption curves are weeks. Capacity expansion curves are years. The gap is the trade. Chain Two Layers Deeper extends the same mechanism across the full eight-layer surface.

Four buckets, eight layers

The eight layers do not bind at the same time or behave the same way under stress. They sort into four buckets:

Already binding. L1 Compute, L2 HBM, L4 Advanced Packaging. Allocations are sold through 2026. 2027 is substantially pre-committed.
Tightening into 2026–2027. L5 Chemistry, L6 Power, L7 Cooling. Where second-order beneficiaries live, and where mispricing concentrates.
Paid in any scenario. L8 EDA. Royalty on complexity. Wins whether NVIDIA dominates or hyperscaler ASIC programs grow.
Watch layer, not binding layer. L3 Storage. Becomes binding on a 2027 roadmap if HBF endorsement materializes; not yet load-bearing on the thesis.

Where the ForcedAlpha edge concentrates

Layer	Constraint intensity	Market awareness	ForcedAlpha edge
L1 Compute	High	High	Medium
L2 HBM	High	Medium-High	Medium
L3 Storage	Watch (2027)	Low	Medium
L4 Packaging	High	Medium	High
L5 Chemistry	High	Low-Medium	Very High
L6 Power	High	Medium	High
L7 Cooling	High	Medium	Medium
L8 EDA	High	Medium	High

The ForcedAlpha edge is highest where the supplier list is narrow, the qualification cycle is long, and the market still categorizes the company outside AI.

The body sections below run through each layer in order. The market has re-rated the visible AI layer at the top — NVIDIA, TSMC. It has not yet re-rated all of the physical receivers below it.

Why this re-rates now

Frontier model prices have fallen enough to stimulate real usage. The Anthropic / OpenAI / Google composite basket fell more than 30% YoY across 2024–25, and Gemini Flash crossed below $0.10 per million output tokens.³
Agentic workflows are moving from demos to enterprise deployment. Cursor went from $500M to over $2B ARR in 14 months. GitHub Copilot disclosed 77,000 enterprise organizations.⁵⁶
Reasoning models consume far more tokens per task. Tokens-per-task increases of 10–100× change the unit economics of every customer-facing agent.
HBM and CoWoS are already sold through 2026. 2027 is substantially pre-committed across SK Hynix, Micron, and Samsung HBM lines and TSMC CoWoS capacity.⁹
Power and cooling bottlenecks are now visible in datacenter buildouts. Large power transformers above 500 MVA carry 36–48 month lead times; medium-voltage switchgear runs 18–24 months.⁷
Captive ASICs do not reduce the physical stack — they broaden it. Google TPU, AWS Trainium, Microsoft Maia, and Meta MTIA all ride TSMC, HBM, ABF, power, and cooling. Bear A pivots the trade surface above the physics; the physics stays binding.
The market has re-rated the visible AI layer but not all of the physical receivers below it. NVIDIA and TSMC are priced. The eight-layer surface beneath is not uniformly priced.

Position Implication, Plain

The eight layers do not produce one trade. They produce a portfolio with a sequenced binding-time schedule. The structural lean, before any Pro-tier convergence scoring or live cascade indicator, runs as follows:

L1 (compute) — NVIDIA, TSMC, Broadcom. Mainstream is NVIDIA; Bear A adds Broadcom and Marvell; CPO 2026 ramp adds Coherent and Lumentum. Full layer map and convergence scoring tracked in Pro.

L2 (HBM) — SK Hynix, Micron, Samsung. SK Hynix is the structurally dominant share holder; Samsung and Micron carry material non-HBM cyclical exposure. Vera Rubin allocation in H2 2026 is the pivotal disclosure. Full layer map and convergence scoring tracked in Pro.

L3 (storage) — Kioxia. Watch layer rather than current trade. Becomes binding on a 2027 roadmap if HBF endorsement materializes. Full layer map tracked in Pro.

L4 (packaging) — TSMC, Ajinomoto, BESI. ABF film and BESI hybrid bonders are the most concentrated single-supplier risks; Ibiden and Unimicron sit downstream of the ABF film. Full layer map and convergence scoring tracked in Pro.

L5 (chemistry) — TOK, Shin-Etsu, Resonac. The most structurally concentrated layer and the differentiated layer for ForcedAlpha; qualification cycles run 12–36 months. Resonac sits at two distinct chokepoints (CMP slurries and MR-MUF). JSR was privatized via JIC. Full layer map and convergence scoring tracked in Pro.

L6 (power) — Eaton, Powell, HD Hyundai Electric. Powell is the cleanest small/mid-cap pure-play with disclosed multi-year backlogs. Schneider acquired Motivair in October 2024. Full layer map and convergence scoring tracked in Pro.

L7 (cooling) — Vertiv, Schneider. Vertiv provides the broadest publicly accessible exposure at scale; Schneider Electric is the post-Motivair vertically integrated power-plus-cooling supplier. Asetek is consumer AIO, not datacenter pure-play. There is no good US/Europe-listed mid-cap pure-play; private structures hold most of the surface. Full layer map tracked in Pro.

L8 (EDA) — Synopsys, Cadence, ARM. The royalty layer beneath captive silicon. Gets paid in any scenario — Bear A pivot, Bear C compression, or base case. Full layer map and convergence scoring tracked in Pro.

Live convergence scores, current cascade claim status, and named entry/exit triggers per ticker are tracked in the Pro cascade dashboard. The structural lean above is a free-section narrative, not a Pro indicator.

Three types of alpha in this thesis

The eight layers split into three trade types, each with a different edge and a different spread.

Consensus compounders. NVIDIA, TSMC, Broadcom, Vertiv, Eaton, Synopsys, Cadence, ARM. Quality names. The market already understands the AI thesis is theirs. Spreads are tight and re-rating multiples are visible.

Under-owned receivers. Ajinomoto, Ibiden, Unimicron, BESI, Resonac, ADEKA, Stella Chemifa, HD Hyundai Electric, Hyosung Heavy, Powell. Where ForcedAlpha edge concentrates. Many of these names are not yet priced as AI beneficiaries.

Triggered bottleneck convexity. Capacity miss, qualification delay, 2027 pre-commitment disclosure, transformer backlog expansion, cooling attach-rate inflection, hyperscaler ASIC tape-out cycle, chemistry supplier qualification event. Tracked claim by claim in Pro.

NVIDIA is the visible trade. The forced trade may sit two to six layers beneath it.

Layer 1: Compute Accelerators

L1 · Primary Chokepoints

NVIDIA, AMD, Broadcom (AVGO), Marvell (MRVL), TSMC

Tickers: NVDA · AMD · AVGO · MRVL · TSM (TSMC via ADR)

Watch: TSMC CoWoS monthly capacity — target 60,000+ wafer starts per month by end-2026.⁸

Compute is where token economics originate. NVIDIA dominates merchant inference silicon. TSMC manufactures almost all leading-edge logic at 3nm and 5nm. Broadcom and Marvell co-design Google's TPU v6e Trillium and Microsoft's Maia 100.

The binding constraint is not dies. It is CoWoS — TSMC's advanced packaging — which assembles GPU + HBM into a finished accelerator. TSMC operates CoWoS at production scale with no merchant alternative at equivalent advanced-node capacity currently visible.⁸ Performance per watt improves ~2.5× per accelerator generation per MLPerf data,⁴ which is what drives the cost-per-token decline at Trend Line 1.

Networking is part of Layer 1. Marvell's electro-optic DSPs ship 800G and 1.6T pluggable optics; Broadcom's Tomahawk 5 and 6 dominate hyperscaler Ethernet fabrics; Coherent (COHR) and Lumentum (LITE) supply the EMLs that drive PAM4 datacenter optics. Co-packaged optics (CPO) — the next bandwidth-density step — is in qualification at Broadcom, Marvell, TSMC, and Coherent in 2026. Every token traversed in an inference cluster crosses the fabric.

Layer 2: High-Bandwidth Memory

L2 · Primary Chokepoints

SK Hynix, Micron Technology, Samsung Electronics

Tickers: 000660.KS (SK Hynix) · MU · 005930.KS (Samsung)

Watch: HBM supplier utilization disclosures — target sustained above 95%.⁹

HBM is fully allocated through 2026. 2027 is substantially pre-committed. SK Hynix holds ~50% share, Micron ~25%, Samsung ~15%. Each H100 GPU requires six HBM3 stacks; each Blackwell B100 requires eight HBM3E.

The MR-MUF underfill chemistry that lets HBM stack at all runs through Ajinomoto Fine-Techno and Resonac (formerly Showa Denko) with no third qualified supplier visible.¹⁰

HBM4 is shipping samples. SK Hynix holds ~70% of NVIDIA’s Rubin allocation; Samsung ~30%. Micron led HBM3E and is now locked out pending qualification. A Micron requalification before late 2026 is cascade-tracked. If Micron does not regain Rubin allocation, concentration at SK Hynix increases further. JEDEC base specs and Rubin supplier specs differ (see appendix).⁹

HBM is not DDR5 RDIMM. HBM is allocated, multi-year contracted, and supply-bound — there is no spot market. DDR5 RDIMM is commodity-cyclical. The TVC thesis is HBM-specific. Generalist memory exposure to Micron, SK Hynix, or Samsung captures both, and the DDR5 cyclicality can mask the HBM allocation picture in any given quarter. → see Appendix: HBM4 specifications.

Layer 3: Enterprise NAND Storage

L3 · Primary Chokepoints

Kioxia, Solidigm, Seagate, Western Digital, Phison

Tickers: 6600.T (Kioxia) · SOLIDIGM (SK Hynix sub) · STX · WDC · PHISON (2303.TW)

Watch: Enterprise NAND bit shipment mix — target above 35% of total industry bits by Q4 2026.

Layer 3 is emerging, not binding. Today’s binding inference constraints sit at L1, L2, L4, L5, L6, and L7. Enterprise NAND becomes binding on a 2027 horizon through two mechanisms: persistent agent state across sessions, and KV-cache offloading at long-context inference scale.

Both are pre-production at frontier vendors today. The HBF (High-Bandwidth Flash) standard is under development at OCP (Open Compute Project), co-developed by SK Hynix and SanDisk. If hyperscaler endorsement materializes in 2026–2027, HBF would structurally bind NAND into the inference memory hierarchy as a new product category. The cascade tracks SanDisk and SK Hynix HBF disclosures; ADEIA (ADEA) is gated on hyperscaler endorsement.

Layer 4: Advanced Packaging

L4 · Primary Chokepoints

TSMC CoWoS, Ajinomoto ABF Film, Ibiden, Unimicron, BESI

Tickers: TSM · 2733.T (Ajinomoto) · 4062.T (Ibiden) · 3037.TW (Unimicron) · BESI.AS

Watch: ABF (Ajinomoto Build-up Film) substrate utilization — confirmed above 90%, no production-scale qualified alternative visible.¹¹

Three packaging materials gate AI accelerator output. The silicon interposer for CoWoS (Chip on Wafer on Substrate) is manufactured at TSMC at production scale, with no merchant alternative at equivalent advanced-node capacity currently visible. ABF (Ajinomoto Build-up Film) is the dielectric layer in advanced organic substrates produced by Ibiden and Unimicron. It has no production-scale qualified alternative visible.¹¹ The MR-MUF underfill chemistry (described at L2) is the third.

For HBM4 hybrid bonders, BESI (BE Semiconductor Industries) holds an estimated 75–85% share, with Applied Materials (AMAT) as the secondary option.

Layer 5: Specialty Chemistry

L5 · Primary Chokepoints

JSR, TOK, Shin-Etsu, Stella Chemifa, Kanto Denka, Soulbrain, Merck KGaA, ADEKA, Sumitomo Chemical

Tickers: 4203.T (JSR, privatized) · 4186.T (TOK) · 4063.T (Shin-Etsu) · 4625.T (Stella Chemifa) · 4229.T (Kanto Denka) · 036830.KQ (Soulbrain) · MRK.DE (Merck KGaA) · 4401.T (ADEKA) · 4005.T (Sumitomo Chemical) · 4182.T (MGC) · 092070.KQ (DNF) · LIN · APD

Watch: JSR + TOK + Shin-Etsu combined EUV photoresist share — confirmed above 85%.¹²

Layer 5 is the most structurally concentrated layer. Specialty chemicals at advanced nodes — photoresists, etchants, ALD precursors, ultra-pure gases, underfill — come from a small set of mostly Japanese and Korean producers with 12–36 month qualification cycles.

EUV photoresist is dominated by JSR (privatized via JIC), Tokyo Ohka Kogyo, and Shin-Etsu Chemical — together ~85–90% of global share per Techcet.¹² Sumitomo Chemical holds the remaining 10–15%. Ultra-high-purity hydrofluoric acid (HF) is supplied by Stella Chemifa and Kanto Denka; Soulbrain (036830.KQ) is the sole Korean-domiciled merchant-scale HF supplier for Samsung and SK Hynix memory — load-bearing for procurement chains with geographic qualification requirements.

ALD precursors for high-k dielectrics (HfO₂, ZrO₂) come from Merck KGaA (post-Versum), ADEKA (4401.T), and DNF (092070.KQ). CMP slurries — used to polish each wafer layer flat between patterning steps — are a separate concentrated chokepoint: Resonac is the global leader in tungsten and copper CMP at advanced nodes, with CMC Materials (acquired by Entegris in 2022) secondary. Each successive node generation requires more CMP steps; demand scales super-linearly with node-count progression.

Resonac sits at two structurally distinct chokepoints in the AI fab cascade: CMP slurries here and MR-MUF underfill at Layer 2. Industrial gases (Linde, Air Products) operate on-site plants at TSMC, Samsung, and Intel — quasi-captive once established. → see Appendix: Specialty chemistry mechanics.

Layer 6: Electrical Infrastructure

L6 · Primary Chokepoints

Eaton, Schneider Electric, HD Hyundai Electric, Hyosung Heavy Industries, ABB, Powell Industries, Vertiv

Tickers: ETN · SU.PA · 267260.KS (HD Hyundai Electric) · 298040.KS (Hyosung Heavy) · ABB · POWL · VRT

Watch: Large power transformer lead times — confirmed at 36–48 months globally.⁷

Every AI datacenter is, structurally, a power plant with servers attached. Large power transformers, medium-voltage switchgear, UPS, PDUs — lead times in years, not months. The manufacturing base is fewer than ten companies globally.

LPTs above 500 MVA carry 36–48 month lead times globally as of 2026.⁷ The base is concentrated in South Korea (HD Hyundai Electric, Hyosung Heavy), Germany (ABB, Siemens Energy), and a declining US presence. US datacenter electricity consumption is projected at 8–12% of total grid load by 2030 per EPRI estimates, up from ~4% in 2022.¹³

Medium-voltage switchgear at Eaton and similar suppliers carries 18–24 month lead times. Powell Industries (POWL), specialized in MV switchgear with disclosed multi-year backlogs, is the cleanest pure-play expression of the bottleneck. Schneider Electric acquired Motivair in October 2024, becoming the primary vertically integrated power-plus-cooling supplier alongside Eaton and Vertiv.

The grid itself is now formally rationed. PJM Interconnection — the largest US RTO, serving 65M people across 13 states + DC — held two consecutive Base Residual Auctions at the FERC-approved price cap. The 2027/28 BRA was the first in PJM history (since 2007) to fail to procure enough capacity to meet the reliability target, falling 6,623 MW short. PJM CEO David Mills, in a May 8 2026 stakeholder letter, named the situation directly: "an era of scarcity," "structurally different from prior periods of tightness," "the current situation is not tenable." The cost of a marginal token is no longer just the chip layer — it is increasingly the grid, where the resolution path runs through GOES (Cleveland-Cliffs Butler is the sole US merchant producer), large power transformers (HD Hyundai, Hyosung), and either nuclear restarts (years) or behind-the-meter gas turbines (24–36 month lead times). The substrate cost has migrated upstream of silicon. PJM letter · 2027/28 BRA results

Layer 7: Liquid Cooling

L7 · Primary Chokepoints

Vertiv, Asetek, Schneider Electric (via Motivair)

Tickers: VRT · ASTK.OL (Asetek, Oslo Bors, small float) · SU.PA

Watch: Liquid cooling attach rate on new AI rack deployments — reported at 60%+ for AI-specific racks in 2024; target above 70% by Q4 2026.

Chip power density has crossed the threshold where air cooling no longer works for AI racks. The NVIDIA GB200 NVL72 — the reference rack for Blackwell — requires direct liquid cooling. Liquid cooling is now mandatory infrastructure, not an upgrade.

The supply surface is striking for how concentrated it is in private companies. Boyd (Goldman Sachs PE), CoolIT (KKR 2023), LiquidStack (Carrier Global), and Submer (BlackRock-led 2024) are all private. Motivair was acquired by Schneider Electric in October 2024, removing the last independent mid-size public liquid cooling specialist. Asetek (ASTK.OL, Oslo Bors) is publicly listed but is predominantly a consumer all-in-one (AIO) liquid-cooler vendor — its datacenter line is small revenue share, float is sub-$100M.

There is no good US/Europe-listed mid-cap pure-play on datacenter liquid cooling. Vertiv (VRT) provides the broadest publicly accessible exposure at scale. CDU lead times are disclosed at 6–12 months by Vertiv. PFAS regulation — EPA and ECHA actions expected in 2026 — could force reformulation of dielectric cooling fluids in immersion systems, adding supply disruption to that segment.

Layer 8: EDA — Royalty on Silicon Fragmentation

L8 · Primary Chokepoints

Synopsys, Cadence Design Systems, Arm Holdings

Tickers: SNPS · CDNS · ARM

Watch: Number of distinct hyperscaler captive silicon programs at TSMC advanced-node tape-out — tracking above 6 as of May 2026.

Every captive ASIC — Google TPU, AWS Trainium, Microsoft Maia, Meta MTIA — runs through Synopsys and Cadence. EDA is the design-layer chokepoint for the captive-silicon wave Bear A describes. As captive share rises, EDA royalty per tape-out rises with it. EDA gets paid first whether the buyer is NVIDIA, Google, AWS, or Microsoft.

Synopsys and Cadence together hold ~80–90% of advanced-node EDA. Synopsys closed the Ansys acquisition in July 2025, with Q1 2026 the first integrated quarter — expanding the standard digital flow into system-level simulation for chiplet, 2.5D, and 3D advanced-package designs. Cadence’s competitive response is its generative-AI design suite. Both vendors are pricing AI-augmented tooling as premium-tier additions, expanding revenue per seat at hyperscaler customers.

ARM Holdings holds the same near-monopoly at the instruction-set layer. Google Axion, AWS Graviton, Microsoft Cobalt, Ampere Altra, and NVIDIA Grace all license ARM v9-A. Hyperscaler ARM royalty is the layer beneath the captive-silicon wave: every captive ASIC tape-out is also an ARM royalty event. Even if Broadcom and Marvell capture the merchant ASIC co-design, ARM still gets paid on every die.

The Chinese counterweight runs at low-single-digit global share. The named producers are Empyrean (301269.SZ), Primarius (688206.SH), and Semitronix (688053.SH). China cannot tape out 7nm-and-below today without Synopsys or Cadence. Even if 2027–2028 brings a Chinese advanced-node flow, the Western captive-silicon royalty stream stays locked to Synopsys, Cadence, and ARM. Tape-out volume scales with captive-silicon ramp; ramp scales with inference load — TVC, end-to-end. → see Appendix: EDA market structure.

The Three Bear Cases

Some bear cases kill the trade. Others only move the bottleneck. The thesis has three distinct stress modes. Bear A is a pivot in the trade surface. Bear B is a falsifier of the demand mechanism. Bear C compresses the bottleneck premium without falsifying the thesis. They behave differently and should be managed differently.

Bear A — Surface Pivot (not a falsifier)

Hyperscaler Captive ASIC Displacement

If the weighted share of frontier inference run on captive silicon (Google TPU, AWS Trainium, Meta MTIA, Microsoft Maia) crosses ~60%, this is a pivot in the investable surface, not a falsification.

Inference volume still grows. L2–L8 stay binding. The position adjusts: reduce NVIDIA, add Broadcom (TPU co-design) and Marvell (Maia co-design); ARM, Synopsys, and Cadence get paid in either world. Current captive share is ~15–25% per SemiAnalysis, Omdia, and hyperscaler-disclosed deployment counts in Q1 2026; the 35–40% range is a 2027–2028 trajectory. The Bear A trigger is 60%, not the current condition.

Bear B — Thesis Falsifier

Agent Capability Plateau

If foundation model capability plateaus and enterprise customers stop deploying agents — two consecutive quarters of agent ARR basket below 40% YoY, or a named lighthouse churning publicly — the demand-side multiplier goes to zero. This falsifies TVC entirely.

If Bear B activates: exit physical stack positions. Inference volume compounding is broken. Physical supply constraints become surpluses.

Bear C — Spread Compression (not a falsifier)

Supply Response Outruns Demand

Capacity expansion catches up faster than expected. HBM lines ramp ahead of schedule, CoWoS adds wafer starts faster than 60,000/month by end-2026, multiple suppliers qualify on ABF, MR-MUF, and high-purity chemistry, transformer and cooling capacity accelerates, and China–Korea–Japan add capacity aggressively under industrial-policy pressure.

Demand stays strong. The bottleneck premium compresses. Position implication: trim the most concentrated single-supplier names (BESI, Ajinomoto Fine-Techno exposure, Powell on power) toward more diversified L6/L7 names; the thesis is intact but the spread is no longer the trade. Bear C compresses without falsifying — distinct from Bear A (pivots the trade surface) and Bear B (breaks the demand mechanism entirely).

What We're Watching

Pro: The Hypothesis Registry

This is not a narrative basket. It is a monitored hypothesis registry — claims across 8 layers, each with a binary falsifier, a named primary source, a price-sensitivity score, and an explicit position implication. Claims are checked nightly. Flips run a three-pass review before any casefile edit.

Access the cascade dashboard →

How to Read the Cascade

Each casefile is backed by a machine-verifiable hypothesis registry — a structured way of tracking whether the thesis claims remain true over time.

26 claims across 8 layers plus 2 master trend claims (cost-per-token declining, utility-per-token rising). Each claim has:

A binary falsification condition (a specific number, threshold, or announced event)
A named primary data source and an estimated tell lead time
A price sensitivity score (1–5) — how much the position changes if the claim flips
A position implication — which tickers, which direction

When a claim flips, it enters a three-pass review: two independent fact-check agents, one adversarial agent, then operator approval before any casefile edit. Registry and casefile never diverge; one without the other is a process violation.

Bear A activates as a pivot at captive share >60%. Bear B activates as a falsifier at agent ARR collapse. Neither has activated as of publication.

This is why the dependency graph matters. AI demand does not hit one company. It propagates through a stack. ForcedAlpha maps the propagation.

Appendix: Technical Depth

Detail that supports the body claims but is not required to read in flow. For primary sources, see the footnotes.

HBM4 Specifications

The published JEDEC HBM4 base specification is 8 Gb/s pin-speed, delivering ~2.0 TB/s per stack across the 2,048-bit interface. NVIDIA's Rubin-targeted supplier specifications run at 13 Gb/s pin-speed, delivering ~2.6 TB/s per stack — an over-spec relative to the JEDEC base, negotiated bilaterally with SK Hynix and Samsung.⁹ The bandwidth uplift is what drives the Vera Rubin platform's per-GPU throughput target. HBM4 is delivered as a stack of 12-high or 16-high DRAM dies connected through TSVs (through-silicon vias — microscopic copper pillars that carry electrical signals through silicon). Each H100 GPU requires six HBM3 stacks; each Blackwell B100 requires eight HBM3E.

EDA Market Structure

Synopsys-Ansys integration adds multiphysics simulation to the standard digital and custom-silicon flow: thermal, electromagnetic, and structural. Captive-silicon programs increasingly require this for chiplet, 2.5D, and 3D advanced-package designs. Cadence's competitive response is Cerebrus and the broader JedAI generative-AI design suite, which uses reinforcement-learning-style search across the place-and-route and optimisation steps to compress design cycles. ARM v9-A is the licensed core architecture in virtually every hyperscaler-built ASIC at advanced nodes. The Chinese EDA position sits at low single-digit global share but is growing inside Chinese fab and design-house workflows under explicit state industrial-policy support. The named producers are Empyrean Technology, Primarius Technologies, and Semitronix. China cannot tape out 7nm-and-below logic without Synopsys or Cadence today; whether 2027–2028 produces a meaningful Chinese flow at advanced nodes is the strategic question.

Specialty Chemistry Mechanics

CMP (Chemical-Mechanical Planarization) slurries are abrasive-chemical mixtures used to polish each layer of a wafer flat between patterning steps. Resonac's CMP leadership traces to the Showa Denko + Hitachi Chemical merger and the Sigma-Aldrich-derived advanced-material lines integrated into the combined entity. Merck KGaA's electronics business is top-3 in semiconductor materials after the Versum Materials acquisition in 2019. ALD (Atomic Layer Deposition) deposits thin films one atomic layer at a time and is the standard for high-k dielectrics like HfO₂ and ZrO₂ used in advanced DRAM cells and HBM stacking.

MR-MUF Chemistry

MR-MUF (Mass Reflow Molded Underfill) is a bromide-based epoxy material that fills the gaps between stacked HBM dies during the reflow step, providing thermal and mechanical stability at production scale. It runs through Ajinomoto Fine-Techno and Resonac (formerly Showa Denko) with no third qualified supplier visible at any HBM producer.¹⁰ The chemistry is what makes vertical HBM stacking manufacturable at all; without MR-MUF, HBM as a product category does not exist.

Sources

Jevons Paradox: efficiency improvements in resource use tend to increase total consumption as they reduce cost per unit and unlock net-new use cases. Originally described by economist William Stanley Jevons in The Coal Question (1865). Applied here to AI inference: lower cost per token → more total tokens consumed. Wikipedia reference
Jordi Visser, 22V Research: "your CapEx is my opportunity" framing for the AI infrastructure cycle; spenders/receivers taxonomy; 5-layer cake of AI capex. Referenced with attribution per ForcedAlpha practice.
Anthropic pricing page — API output token pricing history: anthropic.com/pricing
MLCommons MLPerf Inference benchmarks — hardware perf/watt YoY improvement tracking: mlcommons.org/benchmarks
Cursor ARR trajectory: $500M run-rate Dec 2024; crossed $1B Nov 2025; reached over $2B by Q1 2026. Reported by The Information, TechCrunch, and Bloomberg citing company disclosures to investors and Series C/D financing rounds.
GitHub Copilot Enterprise: Microsoft disclosed 77,000 organizations at Ignite 2024. GitHub press releases: github.blog
Large power transformer lead times — Hitachi Energy and GE Vernova earnings and press releases: hitachienergy.com and gevernova.com
TSMC CoWoS capacity — quarterly earnings calls and investor day presentations: investor.tsmc.com. SemiAnalysis CoWoS quarterly tracking: semianalysis.com
HBM capacity allocation and HBM4 Vera Rubin status — SK Hynix and Samsung earnings calls Q1 2026; SemiAnalysis HBM4 allocation tracking; NVIDIA Vera Rubin H2 2026 volume timeline as disclosed at GTC 2026. skhynix.com/eng/ir; investors.micron.com; semianalysis.com
MR-MUF chemistry — Ajinomoto Fine-Techno IR: ajinomoto.com/ir. Resonac (formerly Showa Denko) IR: resonac.com/investors
ABF substrate supply — Ajinomoto IR (electronic materials segment) and Ibiden/Unimicron earnings calls: ajinomoto.com/ir
EUV photoresist market share — Techcet 2024 photoresist market share report: techcet.com
US datacenter electricity share — EPRI datacenter load growth report (2024/2025 editions): epri.com/research. EIA electricity data: eia.gov/electricity
CHIPS and Science Act (P.L. 117-167, August 2022): $39B manufacturing incentives, $13.2B R&D. CHIPS Program Office preliminary awards and clawback provisions: nist.gov/chips
NDAA FY2024/2025 Section 854: Chinese-origin critical mineral procurement prohibition for Department of Defense. Effective January 1, 2027, with expanding scope through 2030. congress.gov
Defense Production Act Title III: permanent authority, reauthorized continuously since 1950. Recent invocations for lithium ($35M), rare earth processing (MP Materials), germanium. US Code Title 50. law.cornell.edu
China export controls on strategic minerals: gallium and germanium (MOFCOM, August 2023, tightened December 2023); graphite (December 2023); antimony (August 2024). mofcom.gov.cn
USGS Mineral Commodity Summaries 2025: fluorspar production (China 60%+); rare earth refined production (China 85–90%); gallium production (China 80%+). usgs.gov
Ball bearing supply chain crisis, 1942–1943: War Production Board dependency mapping; SKF global supply concentration; Schweinfurt raids (August and October 1943). US Air Force Historical Studies No. 154. National Archives.
National Integrated Circuit Industry Investment Fund Phase III (approximately $47.5 billion / 344 billion yuan, announced May 2024). Reuters, Bloomberg reporting. reuters.com
HALEU Availability Program: approximately $700M authorized under the Inflation Reduction Act for domestic HALEU production. DOE Office of Nuclear Energy: energy.gov/ne/haleu-availability-program
NdFeB permanent magnet manufacturing: China produces an estimated 90%+ of global NdFeB magnets. Adamas Intelligence, 2024. adamasintel.com