
Introduction: A $33 Billion Market Running at the Speed of Scarcity
The Blackwell B200 backlog hit 3.6 million units before most enterprise procurement teams had finished their fiscal 2025 budget reviews. That single supply constraint tells you more about the AI accelerators market than any five-year forecast: this is not a market growing because adoption is accelerating, it's a market constrained because production capacity cannot keep pace with a demand wave that has no historical precedent.
Kaiso Research's primary dataset across this segment puts the 2025 valuation at USD 33.05 billion, growing to USD 431.67 billion by 2035 at a 29.30% CAGR. The top-line figure obscures a bifurcation already reshaping how enterprise AI teams budget, how hyperscalers procure, and how semiconductor designers prioritise their roadmaps. GPUs account for approximately 58% of accelerator revenue in 2025. Custom ASICs are growing at 43% CAGR.
Those two numbers cannot coexist indefinitely without producing a structural realignment in who controls the market. That realignment is what this analysis is about.
NVIDIA's 80% Share Masks a More Contested Market Underneath
NVIDIA commands roughly 80% of AI accelerator revenue by value in 2025, and the data center business posted USD 193.7 billion in revenue for fiscal year 2026. Those numbers read as dominance. They are, but it's conditional dominance built on a software moat, not a hardware one.
CUDA is the constraint that matters. An enterprise that has optimised its training pipeline around CUDA does not switch to AMD ROCm because Instinct MI300X benchmarks look attractive. It switches only when the software compatibility cost drops below the procurement saving, and that threshold has been shifting since PyTorch 3.1 introduced native ROCm support in 2025.
AMD's Instinct MI300X is already in production across Meta's Llama 3 and Llama 4 inference infrastructure. OpenAI has confirmed a multi-generational strategic agreement covering AMD Instinct MI450 deployments starting in the second half of 2026, with a warrant structure vesting at up to 6 gigawatts of deployment. That's not a pilot. That's a supply chain commitment that restructures competitive dynamics across the next three years.
The GPU market still belongs to NVIDIA. But the GPU market's share of total accelerator spend is contracting, signaling a broadening focus across the wider global artificial intelligence gpu chip market.
Custom ASICs Are the Fastest-Growing Architecture in the Stack
The 43% CAGR projection for custom ASICs through 2035 is the most consequential number in Kaiso Research's primary dataset. It reflects a straightforward economic logic: when your inference workload runs continuously at scale, the cost differential between a general-purpose GPU and a purpose-built chip becomes a business model decision, not a technical one. This long-term migration underpins structural projections highlighted in our comprehensive global ai inference hardware market study.
Midjourney demonstrated this in 2025. Switching inference from NVIDIA GPUs to Google TPU v6e reduced the company's monthly compute bill from USD 2.1 million to USD 700,000, an annualised saving of USD 16.8 million on a single workload category. That is the number that changes procurement conversations at every AI-native business running continuous inference.
Google released its seventh-generation TPU, Ironwood, in November 2025. At 4,614 TFLOPS of FP8 compute, Ironwood sits close to NVIDIA Blackwell's approximately 5,000 TFLOPS peak figure, but SemiAnalysis estimates that TPUs achieve 90% sustained model FLOP utilisation for transformer workloads versus 70-80% for GPUs.
Google claims a total cost of ownership per Ironwood chip roughly 44% lower than a GB200 server from its own procurement perspective. Anthropic committed to 1 million Ironwood chips in October 2025, representing over 1 gigawatt of compute capacity, making TPU v7 the first custom ASIC to reach seven-figure deployment volume at a single customer.
AWS has deployed more than 500,000 Trainium2 chips in production as of late 2025, with the Project Rainier facility in Indiana running approximately half that fleet for Anthropic's model training. Trainium2 delivers 83.2 petaflops in ultra-server configurations, and AWS claims 30-40% better price performance than GPU-based EC2 P5e instances for sustained training workloads. Trainium4, confirmed for late 2026 or early 2027 availability, promises six times FP4 throughput and four times memory bandwidth over Trainium3, with 288 GB of memory and support for NVIDIA NVLink Fusion, enabling hybrid clusters that mix Trainium and GPU workloads.
The pattern across Google TPU Ironwood and AWS Trainium2 is identical: best-in-class performance on specific transformer workloads at materially lower cost than NVIDIA equivalents. The hardware case for custom ASICs is no longer speculative.
The Hyperscaler Capex Signal Is the Market's Clearest Indicator
The AI accelerators market runs on hyperscaler capex, and the 2026 capex figures are structurally larger than any prior cycle. The five largest hyperscalers, Amazon, Microsoft, Alphabet, Meta, and Oracle, are projected to spend over USD 600 billion on infrastructure in 2026, a 36% increase from 2025. Approximately 75% of that figure, or USD 450 billion, targets AI infrastructure directly. Goldman Sachs projects total hyperscaler capex from 2025 through 2027 will reach USD 1.15 trillion, more than double the USD 477 billion spent from 2022 through 2024.
That capital flows into the accelerator stack at three layers. GPU procurement from NVIDIA Blackwell represents the largest single line item. Custom ASIC development and scaling, through Google TPU, AWS Trainium, Meta MTIA, and Microsoft Maia, represents the fastest-growing allocation. High-bandwidth memory from SK Hynix, Samsung, and Micron Technology represents the binding constraint, with one major HBM supplier confirming its full 2026 output is already contracted and expecting tight supply to persist beyond 2026.
The HBM market is projected to grow from approximately USD 35 billion in 2025 to USD 100 billion by 2028. That trajectory directly limits the production volume of every GPU and custom ASIC in the market, regardless of which architecture wins the performance benchmark.
The constraint hierarchy is: TSMC fabrication capacity at sub-5nm nodes, then HBM memory supply, then package-on-wafer CoWoS capacity at TSMC. NVIDIA, AMD Instinct, Google TPU, and AWS Trainium all compete for the same bottleneck resources. Supply leadership, not design leadership, is the decisive competitive variable through at least 2027, highlighting the volatile dynamics we map in our ongoing analysis of the ai gpu chip market.
Edge AI Accelerators Are Growing Fastest Within Technology Integration
Cloud-based deployment captured 75% of 2024 AI accelerator spending, confirming that hyperscale data centres remain the primary deployment environment. But the fastest-growing segment within technology integration is edge AI accelerators, driven by automotive ADAS systems and industrial IoT inference requirements where latency constraints make cloud-based processing impractical.
The automotive application is the most concrete. Modern ADAS platforms running Level 2+ autonomy functions process sensor fusion, object detection, and path planning in real time, with sub-10-millisecond latency requirements that cannot be met by round-tripping inference to a data centre. NVIDIA's Drive Thor SoC and Qualcomm's Snapdragon Ride Flex platform are the primary silicon choices for production ADAS deployments entering vehicle platforms between 2025 and 2027. Tesla's Dojo custom training accelerator, designed for video-based neural network training at scale, represents a vertically integrated approach that internalises both training and edge inference silicon.
In healthcare, diagnostic imaging systems from Siemens Healthineers, GE HealthCare, and Philips are integrating edge accelerator silicon to enable real-time AI-assisted interpretation at point-of-care. CT reconstruction workloads that previously required cloud offloading are now executing on embedded ASIC silicon within the imaging system itself, reducing interpretation latency from minutes to seconds. This deployment pattern is expanding from radiology into pathology and genomics sequencing, where Oxford Nanopore Technologies has deployed custom field-programmable gate array silicon for real-time basecalling at the sequencer level.
The industrial IoT deployment vector is driven by predictive maintenance applications across manufacturing. Siemens Digital Industries and Honeywell International are embedding inference accelerators directly into industrial control hardware, enabling anomaly detection on production machinery without external connectivity. These deployments use FPGAs and custom ASICs rather than GPUs, because the workloads are narrow, deterministic, and run continuously with hard power budgets that GPU architectures cannot meet.
Over 75% of large-scale AI training workloads in 2024 ran on dedicated AI accelerators rather than general-purpose processors, per Kaiso Research's primary dataset. The equivalent figure for edge inference is lower today but is growing at the fastest rate within the segment.
The Competitive Landscape Across Five Architecture Classes
The AI accelerators market in 2026 is not a GPU market with alternatives. It's a five-architecture market where different workloads have different optimal silicon, and the enterprise challenge is building procurement strategy across a portfolio rather than defaulting to a single vendor. This diversification is fundamentally reshaping the core global ai infrastructure market.
NVIDIA maintains the widest deployment base through CUDA software lock-in and the broadest library compatibility across training and inference frameworks. The Blackwell B200 is sold out well into 2026, with a backlog of 3.6 million units. The Blackwell Ultra, arriving in mid-2026, delivers approximately 1.5x the performance of B200 with 288 GB of HBM3e memory and 15 petaflops dense FP4. Pricing for complete 8x B200 server systems exceeds USD 500,000, a figure that makes the capital allocation decision non-trivial for enterprises without clear inference revenue models.
AMD Instinct MI350 shipped to hyperscale data centres in Q3 2025, with the MI400 series confirmed for 2026 and the Helios rack-scale architecture integrating MI400 GPUs with EPYC Venice CPUs and Pensando Vulcano networking. Oracle Cloud Infrastructure announced a 50,000-GPU MI450 cluster for Q3 2026 deployment. Meta runs Llama 3 and Llama 4 inference on MI300X in production. OpenAI's multi-generational agreement with AMD, including a warrant structure vesting at 6 gigawatts, represents the most significant competitive threat to NVIDIA's training workload dominance since the GPU era began.
Google TPU Ironwood is the most mature custom ASIC in external production, with 10,000+ chip coordinated clusters that no competing hyperscaler has matched. Customer wait times for committed TPU capacity run 2 to 3 months versus 18 to 24 months for Microsoft Maia, confirming operational maturity over Microsoft's competing platform. Anthropic's commitment to 1 million Ironwood chips remains the anchor reference deployment in the custom ASIC category.
AWS Trainium2 is deployed at scale, with Trainium3 announced at re:Invent 2025 delivering 1.3 petaflops of FP8 compute per die and scaling to 64,000 chips per UltraCluster for roughly 83 FP8 exaflops. AWS CEO Andy Jassy described Trainium as "already a multibillion-dollar business" at re:Invent 2025.
Cerebras Systems operates outside the conventional GPU and ASIC categories with the WSE-3 wafer-scale engine, offering 4 trillion transistors and 125 petaflops of peak performance. The architecture eliminates inter-chip communication latency for models that fit within a single wafer, a constraint that limits applicability but creates measurable performance advantages on specific LLM inference configurations.
The enterprise procurement question is not which architecture is best. It's which architecture is optimal for the specific workload, at the specific scale, under the specific power and latency constraint, with the specific software stack already in production.
Investment and Funding Dynamics in the Accelerator Stack
The capital flowing into AI accelerators is coming from four distinct sources, and the dynamics of each are different.
Hyperscaler procurement is the largest category, with combined 2026 capex exceeding USD 600 billion. This capital flows primarily to NVIDIA and, increasingly, to custom ASIC development funded through multi-year Broadcom and Marvell partnerships. Broadcom designs the networking silicon and serialiser/deserialiser components that enable large-scale ASIC clusters. Marvell designed the original Trainium2 silicon.
Both firms benefit directly from the hyperscaler shift toward custom silicon, regardless of which hyperscaler wins the inference competition. This shift is a key driver behind the rapid expansion of the broader global semiconductor manufacturing market.
Sovereign AI investments exceeded USD 30 billion in NVIDIA's fiscal year 2026, more than tripling year-over-year. The United Kingdom, France, Netherlands, Canada, and Singapore were the primary contributors. This represents governments building national AI compute infrastructure through direct GPU procurement rather than relying on hyperscaler cloud access.
Venture capital has concentrated in the startup accelerator tier. Groq, which builds language processing units optimised for LLM inference latency rather than throughput, raised at a valuation above USD 2.5 billion in 2024. Etched has focused on transformer-specific ASIC architecture with a single-chip inference optimisation that sacrifices architectural flexibility for deterministic throughput. d-Matrix has positioned its compute-in-memory ASIC for inference workloads where memory bandwidth is the binding constraint rather than compute throughput.
The common thread across these investments is a thesis that NVIDIA's architecture is over-engineered for inference workloads at scale. That thesis is correct on a per-token basis. Whether it survives the CUDA software moat is the question that will determine which startup accelerators reach production scale.
Regulatory Developments: Export Controls as the Market's Shadow Constraint
The U.S. Bureau of Industry and Security export control framework has become the most consequential policy input into AI accelerator supply chains. The H100 export restrictions imposed in late 2022, followed by the October 2023 updates covering a broader set of chips, removed NVIDIA's most capable hardware from Chinese market access. The April 2025 restriction on H20 sales to China cost NVIDIA USD 5.5 billion in inventory charges. The December 2025 transactional diffusion framework allowing H200 exports to China under a 25% revenue-sharing fee created a partial reopening, triggering reported orders from Alibaba and ByteDance exceeding 2 million H200 units for 2026 delivery.
The Remote Access Security Act, passed the U.S. House of Representatives in January 2026 by a 369-22 vote, proposes extending export controls to remote cloud access of controlled accelerator capacity. If enacted, it would impose compliance obligations on cloud providers and GPU rental intermediaries around customer screening and licence requirements for controlled hardware, including AI accelerators. The policy direction turns supply risk into policy risk for any enterprise with cloud-based AI workloads touching non-allied jurisdictions.
The January 2025 AI Diffusion Framework created ECCN 4E091, which controls AI model weights under export regulations for the first time. This extends the regulatory perimeter from hardware to the trained models that run on hardware, creating a compliance surface that extends well beyond semiconductor procurement teams into AI research, deployment, and international commercial agreements.
TSMC's sub-5nm fabrication capacity remains the chokepoint that all export control frameworks target. Keeping advanced node manufacturing outside China concentrates it in Taiwan, South Korea, and, with the Rapidus 2nm program, Japan. Quest Global's March 2025 memorandum of cooperation with Rapidus Corporation positions the 2nm program as a future design alternative to TSMC's N3 and N2 nodes, though volume production is not expected before 2028.
Strategic Implications for Enterprise Compute Buyers and Capital Allocators
Enterprise technology leaders facing AI accelerator procurement decisions in 2026 are operating in a market where the architecture question is as consequential as the vendor question. The standard approach of defaulting to NVIDIA and budgeting for H100-equivalent performance is no longer adequate strategic planning. It's a procurement shortcut that leaves material cost efficiency on the table at inference scale and creates single-vendor exposure in a market where AMD MI450, Google TPU access, and AWS Trainium deployments are de-risking infrastructure strategies at the largest AI operations in the world.
The procurement decision framework that emerges from Kaiso Research's primary dataset has three tiers. Training workloads at hyperscale: NVIDIA Blackwell or Blackwell Ultra remains the default, with AMD Instinct MI450 as the credible alternative for organisations willing to invest in ROCm software compatibility. Inference at sustained scale: custom ASICs, Google TPU access, or AMD Instinct for cost-optimised configurations where CUDA lock-in is not a binding constraint. Edge inference: FPGAs and purpose-built ASICs from Qualcomm, NVIDIA Drive, and vertically integrated alternatives for automotive ADAS and industrial IoT, with latency and power budgets that GPU architectures cannot satisfy.
Capital allocators following the semiconductor supply chain should focus not on which architecture wins, but on which constraint layer captures the margin. HBM memory, where SK Hynix holds 57-62% of supply and the market is projected to grow from USD 35 billion in 2025 to USD 100 billion by 2028, is the clearest bottleneck play. Broadcom's custom ASIC design business, which underpins Google TPU and most other hyperscaler programs, captures design margin outside NVIDIA's product line. TSMC captures fabrication margin regardless of architecture.
The AI accelerators market will not consolidate around a single architecture. It will segment by workload, and the enterprises that build workload-specific accelerator strategies today will not be caught in the supply constraint that already cost unprepared buyers 18-month lead times on Blackwell deployments.
Challenges and Risks: Three Structural Constraints That the CAGR Doesn't Reflect
The 29.30% CAGR projects a market growing from USD 33.05 billion to USD 431.67 billion. It does not reflect three structural risks that could compress that trajectory or redistribute the margin capture within it.
Power infrastructure is the constraint that no procurement budget resolves through chip selection. A single large-scale Blackwell cluster of 1 million GPUs consumes between 1.0 and 1.4 gigawatts, enough to sustain a mid-sized city. The hyperscaler response is nuclear power, with Amazon Web Services, Google, and Microsoft all announcing SMR agreements or conventional nuclear restart commitments. But new nuclear capacity does not come online before 2030 at the earliest.
The data centre buildout is currently outpacing grid capacity, and the power constraint will limit accelerator deployment volume independent of chip supply. This operational bottleneck is explored heavily in our detailed review of how ai data centers push power grids to the limit. Enterprises competing with hyperscalers for power access in key markets, including Northern Virginia, Dublin, and Singapore, face procurement timelines driven by utility capacity, not semiconductor availability.
Software stack fragmentation creates a ceiling on the market share that any non-CUDA architecture can capture in the training segment. NVIDIA's installed base of CUDA-optimised models, training pipelines, and inference runtimes represents a switching cost that is real and measurable. AMD's ROCm software stack has improved substantially, and PyTorch 3.1 native support reduces the porting burden, but the productivity cost of migrating a production training pipeline from CUDA to ROCm remains a 3-to-6-month engineering project at minimum. Custom ASIC platforms require even more bespoke software work, which is why Google TPU adoption outside of Google's own infrastructure has remained concentrated in AI-native organisations with dedicated ML engineering capacity.
Geopolitical exposure through the semiconductor supply chain runs in both directions. Export controls restrict NVIDIA's access to Chinese markets and generate replacement demand for Huawei's Ascend 910C, creating a bifurcated global accelerator market where Chinese AI infrastructure runs on domestic silicon with different performance and software characteristics. At the same time, TSMC's concentration in Taiwan creates systemic risk for the entire supply chain that no hyperscaler procurement strategy fully diversifies. To offset these risks, organizations are re-evaluating the physical layer of the computing supply chain, a trend analyzed in depth within our global physical ai infrastructure market report.
Future Outlook: 2026 to 2030 and the Architecture That Wins at Scale
The USD 431.67 billion projection for 2035 rests on a scenario where hyperscaler capex remains elevated, inference workloads continue to scale with model capability, and enterprise AI adoption broadens from pilot programs into production infrastructure. All three conditions are present in 2026.
The architecture outcome between 2026 and 2030 is more contested than the headline numbers suggest. Custom ASICs growing at 43% CAGR will capture a growing share of total accelerator spending, but the GPU market will also grow in absolute terms as inference volume scales. The most likely outcome is not GPU replacement but GPU stratification: Blackwell Ultra and its successors hold training at frontier scale, while custom ASICs and optimised GPU alternatives capture inference at sustained commercial scale.
The organisations that will find themselves in procurement emergencies by 2028 are those that treated AI accelerator strategy as a hardware decision rather than an infrastructure architecture decision. The window to build workload-specific accelerator strategies with diversified supply chain relationships, before HBM supply contracts through 2027 close off that flexibility, is now.
The 29.30% CAGR is not the story. The hardware hierarchy it will produce is.
Conclusion: The Architecture Decision Is Already Being Made for You
NVIDIA's Blackwell backlog is not a constraint on AI ambition. It's a signal that the market has outgrown its supply chain, and the enterprises sitting on that waiting list are already 12 to 18 months behind the organisations that built diversified accelerator strategies in 2024 and 2025. Anthropic runs across Google TPU Ironwood, AWS Trainium2, and NVIDIA GPUs simultaneously. Meta runs Llama inference on AMD Instinct MI300X while committing to a multi-generational AMD Instinct MI450 roadmap.
OpenAI signed a 6-gigawatt AMD deployment agreement while continuing to rely on NVIDIA for frontier training. These are not pilot programs. They are the architecture strategies of the organisations that understand this market's supply constraints most clearly.
The AI accelerators market, valued at USD 33.05 billion in 2025 and growing to USD 431.67 billion by 2035, is not a market where the dominant architecture has been selected. It is a market where the architecture decision compounds every year through procurement lock-in, software investment, and power infrastructure commitment. The hierarchy that gets built between 2025 and 2028 will be the hierarchy that runs AI at scale through 2035.
The time to make that decision is not after the next NVIDIA backlog announcement.
About Kaiso Research and Consulting
Kaiso Research and Consulting is a global market intelligence firm publishing comprehensive research reports across key industry verticals.
- Analyst Desk: Covered by the Life Sciences and Healthcare Technology team across North America, Europe, and Asia-Pacific.
- Publication Date: June 17, 2026 | Report Code: LSPH92
- Inquiries: [email protected] | +1 872 219 0417
- Market Study: Access the full index or request a complimentary sample directly via the Global AI for Drug Development and Discovery Market report page.


