GPUs, TPUs, and the Economics of AI: Why Token Costs, Power, and Infrastructure Strategy Now Matter

GPUs, TPUs, and the Economics of AI: Why Token Costs, Power, and Infrastructure Strategy Now Matter

Author: Zion Zhao Real Estate | 88844623 | 狮家社小赵

Author’s note: This essay is written for education and market literacy, not as financial advice or a solicitation to buy or sell any security. Markets can fall as well as rise, and past performance is not indicative of future results. This essay is written based on based on the Invest Like the Best interview with Gavin Baker.











TL;DR: GPUs, TPUs, and the Economics of AI

AI is reshaping Singapore property through jobs, capital flows, and infrastructure demand, especially data centres and power upgrades. Whether you buy, sell, rent, or invest, understanding these forces helps you time decisions, assess district tailwinds, and manage risk. I translate global AI economics into practical, compliant property strategy.

This conversation reframes AI as an industrial, power-constrained system, not a traditional software story. In classic software, marginal costs trend toward zero. In AI, every useful interaction consumes compute at inference time, while frontier progress consumes immense compute at training time. That shifts competitive advantage toward whoever can produce the most useful tokens per watt, per dollar of infrastructure, and per unit of operational excellence.

A key practical warning is that many observers overgeneralize from weak product tiers. To assess frontier capability and adoption risk, you must test models under real constraints: longer context, tool use, reliability, and multi-step workflows, because that is where economic value is created and where performance differences become visible.

On the supply side, scaling laws still matter, but the binding constraint is increasingly systems engineering: networking, memory, cooling, uptime, and the ability to run giant clusters at high utilization. That is why the Hopper-to-Blackwell transition is portrayed as a platform and data-center transition, not a simple chip swap. NVIDIA positions Blackwell as rack-scale infrastructure, for example the GB200 NVL72, which integrates large numbers of GPUs and CPUs in a liquid-cooled system designed for extreme workloads. (NVIDIA Investor Relations)

The strategic chess match extends to Google’s TPUs and token economics. If token production becomes a competitive commodity, being a low-cost producer can “suffocate” competitors by compressing margins. Google’s seventh-generation TPU, Ironwood, is explicitly framed around inference at scale and efficiency, reinforcing the view that silicon strategy is inseparable from unit economics. (blog.google)

Power is the governor of this entire regime. The International Energy Agency projects global data-centre electricity consumption could more than double to around 945 TWh by 2030, with AI a major driver. (IEA)

Finally, incumbents risk a “SaaS mistake” if they refuse AI agents because margins look worse. In a token-cost world, defending legacy gross margins can be strategically fatal. Geopolitically, supply chains matter, and even “rare earths” are not rare in crustal abundance, but economically hard to refine and concentrate at scale. (usgs.gov)

Introduction: AI Is Not “Software as Usual”

For most of the last four decades, the core economic magic of software was simple: you paid once to build it, then distributed it at near-zero marginal cost. Artificial intelligence breaks that intuition. Every meaningful AI interaction consumes compute at inference time; every improvement cycle consumes compute at training time; and every step of scale is constrained by real-world physics: power delivery, cooling, networking, semiconductor supply, and data center construction timelines.

That is why this conversation between Patrick O’Shaughnessy and Gavin Baker is valuable beyond its headlines. It is not merely “Nvidia versus Google” or “chips versus models.” It is an argument that AI is reshaping competitive advantage around token economicssystem-level engineering, and energy constraints, with second-order effects that will propagate through cloud providers, frontier labs, enterprise software, and geopolitics.

In what follows, I preserve the interview’s strategic backbone while tightening the technical claims, correcting where public evidence differs, and expanding the implications using credible primary and scholarly sources.


1) The First Principle Most People Miss: You Cannot Judge Frontier AI on the “Free Tier”

One of the interview’s most practical points is also the most ignored: if you want to evaluate a frontier model’s real capabilities, you must test the tier that actually reflects the lab’s current product frontier. In enterprise contexts, performance differences often reveal themselves under (i) longer contexts, (ii) tool use, (iii) higher-rate limits, (iv) reliability constraints, and (v) multi-step workflows. This is consistent with a broader research reality: evaluation outcomes are highly sensitive to prompt budgets, context budgets, and inference-time compute strategies—especially for reasoning-style systems. OpenReview+1

Implication: Many market narratives form from shallow, low-budget usage patterns that do not map well to enterprise value creation (where latency, accuracy, uptime, and workflow completion matter more than “wow” demos).


2) Scaling Laws Still Matter, But We Should Speak Precisely About What They Are

Baker describes scaling laws as an “empirical observation” rather than a physical law. That is correct. Scaling laws are statistical regularities observed across model families: as you scale parameters, data, and compute, loss tends to improve predictably—until constraints (data quality, training recipe, architecture, optimization) intervene. The seminal work by Kaplan et al. established early scaling regularities for neural language models, and later work (for example, Chinchilla) clarified that compute-optimal scaling requires balancing model size and training data more carefully than early practice did. X (formerly Twitter)+1

Where the public debate often goes wrong is treating “scaling laws are intact” as meaning “progress is automatic.” Progress is not automatic. Scaling laws are conditional on (i) getting the training system to function at scale, (ii) feeding the model sufficient high-quality data, and (iii) engineering stable training dynamics. In 2024–2026, the binding constraint is frequently not theory. It is systems engineering.


3) The Real Bottleneck: Cluster-Scale Systems Engineering (Not Just “Faster Chips”)

The interview frames a crucial reality: frontier pre-training is not “buy GPUs and train.” It is orchestrating vast distributed systems where networking, memory, topology, software kernels, and cooling determine whether your theoretical FLOPs translate into realized training progress.

Nvidia’s Blackwell-era systems are explicitly designed as rack-scale architectures (for example GB200 NVL72 class systems), pushing performance by treating the rack as the unit of compute rather than the single GPU. Nvidia’s own materials emphasize that these rack-scale systems require a substantial shift in deployment assumptions, including liquid cooling requirements on the order of ~120 kW cooling capacity for certain GB200 NVL72 deployments.

Why this matters economically: In a power-constrained world, the winning system is not merely the cheapest GPU. It is the system that produces the most useful tokens per watt (and per dollar of deployed infrastructure) under real uptime, yield, and reliability conditions.


4) Blackwell: A Product Transition That Is Also a Data-Center Transition

The interview’s analogy—“imagine needing to rewire your house to get the next iPhone”—is hyperbolic, but it captures a real deployment discontinuity: liquid cooling, higher rack densities, and new integration requirements can slow ramp, delay availability, and stress supply chains.

Public reporting has documented challenges in Blackwell’s ramp and manufacturing/thermal issues, underscoring that the transition is not only about chip specs but also about packaging, system integration, and reliability.

Nvidia’s own Blackwell materials stress that the platform is a full-stack architecture shift (GPU, CPU, NVLink, networking, software) rather than a single component upgrade.

Strategic takeaway: In AI, time-to-stable-deployment is itself a competitive advantage. A lab or hyperscaler that can deploy, debug, and run high-utilization clusters sooner can compound faster—because training and inference are both cumulative learning processes.


5) TPUs, ASICs, and the “Low-Cost Token Producer” Thesis

Baker argues that AI is unusual because “being the low-cost producer” finally matters in tech. This is directionally true—because token production has a measurable marginal cost (compute, power, depreciation) and because many AI services are converging toward price competition as baseline capabilities commoditize.

The TPU roadmap illustrates why Google is structurally positioned to compete on cost and vertical integration. Google announced its sixth-generation TPU, Trillium (TPU v6e), emphasizing large performance gains and efficiency improvements for training and serving.
Later, Google announced Ironwood (TPU v7), positioned for inference at scale.

If we accept the premise that token production becomes a commodity over time (at least for non-differentiated tasks), then:

  • Vertically integrated players (owning silicon, networking, and data-center economics) can compress prices.

  • Higher-cost producers may be forced toward (i) differentiation, (ii) premium tiers, or (iii) distribution advantages.

However, one must be careful not to overstate a single “winner.” The competitive frontier is multi-dimensional: latency, context length, tool use, reliability, safety, enterprise controls, and integration ecosystems matter as much as pure token cost.


6) “Reasoning” and Test-Time Compute: The Bridge From Hardware Gaps to Capability Gains

A core claim in the interview is that a wave of “reasoning” progress helped sustain visible capability gains even when hardware transitions were difficult. The concept maps onto a real research direction: test-time compute scaling—methods that spend more inference compute per query to improve solution quality.

There is a growing body of work formalizing test-time compute scaling and demonstrating that “more thinking” (more candidate solutions, more verification, better selection) can improve outcomes without changing the base model weights. OpenReview+1

This links to Andrej Karpathy’s widely cited framing: in classical software, automation is limited by what you can specify; in AI systems, automation expands to what you can verify. Karpathy has stated this idea directly (including in short-form public commentary). X (formerly Twitter)+1

A fact-check on the ARC-AGI “95% in three months” claim

The interview references an “ARC AGI slide” implying a jump to ~95%. Publicly available ARC Prize reporting shows much lower absolute numbers for some widely discussed systems (for example, ~21% for OpenAI o1 on a noted evaluation). ARC Prize
That does not invalidate the direction of change—reasoning-centric systems have shown meaningful improvements on hard benchmarks—but it does mean we should avoid repeating dramatic figures unless the underlying benchmark, split, and scoring method are precisely specified.


7) Frontier Labs Are Not Just Model Builders; They Are Flywheel Builders

A subtle but crucial argument is that “reasoning” makes it easier to create feedback loops: preference signals, outcome signals, verification signals. This creates the possibility of a modern version of the classic internet flywheel: better product → more usage → better data/feedback → better product.

But the flywheel is not universally available. It is strongest when a lab has:

  1. Internet-scale distribution (massive daily interactions),

  2. Unique data advantages (proprietary feedback and workflow telemetry),

  3. Infrastructure leverage (token cost and throughput),

  4. Operational excellence (rapid iteration and safe deployment).

That combination is rare—and it explains why “just throwing money” at model training is not sufficient. Capability is increasingly a product of system design, data feedback, and iteration velocity, not just parameter count.


8) The Geopolitics Layer: Export Controls and Critical Minerals Are Not Side Stories

The interview’s geopolitics section is directionally aligned with mainstream analysis: AI leadership is constrained by access to advanced semiconductors and high-bandwidth memory, and policy has increasingly targeted these choke points. The U.S. Bureau of Industry and Security (BIS) has repeatedly updated export controls aimed at limiting China’s access to advanced semiconductor capabilities. Bureau of Industry and Security
Congressional Research Service summaries also document the evolution and widening scope of controls across chips, tooling, and related technologies. Congress.gov

On rare earths, the interview correctly notes a common misconception: rare earth elements are not necessarily rare in crustal abundance; the challenge is economically and environmentally complex extraction and processing. The U.S. Geological Survey explicitly notes that REEs are “not rare” in average crustal abundance, but concentrated deposits are limited and processing is difficult. USGS

Implication: Supply chains are strategic terrain. Compute leadership increasingly depends on (i) semiconductor access, (ii) materials processing capacity, and (iii) energy and grid buildout.


9) Power Is the Governor: The Energy Constraint Is Now a First-Order Variable

If AI were purely “software,” power would be a footnote. It is not. Power and grid connection timelines are now central constraints on scaling.

The International Energy Agency projects that global electricity consumption for data centres could roughly double to around 945 TWh by 2030, approaching just under 3% of total global electricity consumption in that scenario, with data-center demand growth far outpacing overall electricity growth. IEA+1

This is the economic heart of Baker’s point: in a watt-constrained environment, efficiency outranks price. If your system yields more inference throughput per watt, you can generate more revenue and more product utility for the same constrained resource.


10) Data Centers in Space: A Powerful Thought Experiment That Still Faces Hard Constraints

The interview’s most provocative idea is that data centers “should be in space” because:

  • solar power can be more continuous without night cycles,

  • radiative cooling is possible via thermal radiation to cold space,

  • laser links in vacuum can offer high-speed interconnect potential.

There is credible research exploring variants of this concept, including analyses of space-based data center architectures and their potential energy implications (for example, work discussing “carbon-neutral data centres in outer space”).

But a rigorous assessment must also price in the constraints:

  • Radiation: Space electronics must handle single-event effects and long-term radiation degradation; NASA documentation details single-event upsets and related failure modes as a core spacecraft electronics reliability issue. NASA Electronic Parts+1

  • Maintenance and repair: Servicing dense compute payloads in orbit is not comparable to swapping boards on Earth.

  • Launch economics and cadence: Even with declining launch costs, scaling to data-center-equivalent megawatts implies industrial-scale launch and replacement cycles.

  • Latency and routing: Some inference workloads tolerate added latency; others (interactive agents, trading, realtime robotics) do not.

My assessment: Space-based compute is best treated today as (i) a strategic research frontier and (ii) a long-dated option on radical cost curves, not as a near-term substitute for terrestrial hyperscale buildouts.


11) The SaaS Mistake: Margin Myopia in an AI-Native World

The interview’s enterprise software critique is sharp: many application SaaS companies are reluctant to embrace AI agents because agents compress gross margins. The analogy to brick-and-mortar retail dismissing e-commerce due to early margin structure is historically plausible as a pattern, even if the specifics vary by sector.

The deeper point is economically sound: if AI-native entrants can deliver workflow outcomes at lower gross margins but radically lower headcount (and faster iteration), the competitive battlefield shifts from “gross margin percentage” to gross profit dollars, retention, workflow lock-in, and distribution control.

In other words, AI threatens incumbent SaaS less by “being smarter” and more by:

  1. shifting user expectations toward outcome-complete workflows,

  2. disintermediating SaaS interfaces with agent layers,

  3. turning proprietary SaaS data into a strategic asset that others can siphon via integrations unless the SaaS vendor builds the agent layer first.

This is not a moral critique of incumbents. It is a strategic warning about platform displacement.


Conclusion: What to Watch in the Next Regime

This interview is ultimately about regime change: AI competitiveness is migrating from model demos to industrial economics.

If you want a disciplined framework, watch five measurable variables:

  1. Tokens per watt and tokens per dollar deployed
    Power is a binding constraint. Efficiency is strategy. IEA

  2. Time-to-stable-deployment for new architectures
    Blackwell-class transitions are also cooling, networking, and operations transitions.

  3. Inference unit economics versus training spend
    Training creates optionality; inference captures ROI. Test-time compute changes the inference curve. OpenReview+1

  4. Feedback flywheels and distribution
    Labs and platforms with durable, proprietary feedback loops can compound.

  5. Geopolitics of compute supply chains
    Export controls, critical minerals, and energy infrastructure increasingly shape who can scale. Bureau of Industry and Security+1

AI is not just a software story. It is a physical economy story: electrons, atoms, watts, and supply chains. And in that world, strategy belongs to those who can integrate technology, infrastructure, and economics into a single operating system for scale.

A Macro Driven, Compliance Minded Singapore Property Partner for Global Investors

In today’s market, Singapore real estate is no longer shaped by local factors alone. Global interest rates, geopolitics, supply chain shifts, AI driven capital expenditure, currency cycles, and cross border wealth flows increasingly influence pricing power, rental demand, and liquidity. For international investors, family offices, and families relocating for education, the question is not only “Which property is good?” It is “How does this property fit into my broader portfolio, my time horizon, and the global regime we are entering?”

That is the advisory standard I bring.

I am a Singapore based real estate professional who approaches property as part of a complete asset allocation and risk management framework. I am trained and experienced in macroeconomics, global affairs, portfolio construction, and multi asset investing across equities and cryptocurrencies. I am also proficient in Singapore Land Law, Business Law, statutes, and transaction compliance, because execution matters as much as strategy. In addition, I serve as an Officer Commanding in the Singapore Armed Forces with the rank of Captain, which has shaped a disciplined approach to planning, due diligence, and accountability.

What this means for you is straightforward: you get an advisor who does not “sell a unit,” but builds a decision system.

Why this matters to you as a buyer, seller, landlord, or investor

Most real estate advice stops at nearby comparables and marketing narratives. That is necessary, but no longer sufficient. When you work with me, you benefit from:

  • Macro timing and scenario planning: translating global rate paths, inflation trends, and geopolitical risk into practical decisions on entry timing, holding period, financing, and exit strategy.

  • Cross asset perspective: understanding when property should be emphasized for stability and yield, versus when other asset classes may dominate risk adjusted returns, so your allocation is intentional rather than accidental.

  • Risk and compliance discipline: careful attention to legal structure, documentation, and transaction safeguards, including cross border considerations and a clear, professional process from offer to completion.

  • Institutional grade due diligence: not relying on headlines. I dedicate hours daily to studying markets, reading primary sources, analyzing data, and writing research based essays like the one you just read, because staying current is a responsibility, not a slogan.

For international families, Chinese clients, and regional investors

Whether you are investing from China, Southeast Asia, or globally, or relocating to Singapore for education, family office structuring, or long term residency planning, you need a Singapore partner who understands both the local rules and the global forces driving demand. I work with clients who value discretion, precision, and a clear, compliant advisory process.

我也欢迎来自中国及东南亚的客户,包括陪读家长,留学家庭,家族办公室与机构投资者。新加坡房产不仅是居住选择,更是全球资产配置中的稳健一环。重点在于策略,节奏,与合规执行。

Why Singapore property belongs in a resilient portfolio

Real estate is not risk free, but it is often a less volatile and more stable component compared with many liquid markets, while still offering two core advantages when selected correctly:

  1. Capital appreciation potential through scarcity, long term urban planning, and sustained global demand.

  2. Rental income that functions like a dividend stream, supporting cash flow and portfolio stability.

The objective is not to chase hype. The objective is to own quality assets, in the right locations, at the right price, with a plan that matches your financial profile and life priorities.

If you want advice that goes beyond property headlines

If you are planning to buy, sell, rent, or invest in Singapore, and you want guidance anchored in macroeconomics, cross asset allocation, and disciplined due diligence, I welcome a conversation.

Share your goals, time horizon, and constraints. I will respond with a structured plan: opportunities, risks, and the most practical path forward, in a clear and compliant manner.

When markets are noisy, the advantage belongs to those who prepare more, think wider, and execute with discipline. If that is the standard you want in your Singapore real estate partner, let us connect.



Full Disclosure

This essay is for education and general analysis. It is not investment, legal, or financial advice, and it does not recommend any specific security, trade, or allocation.


References (APA)

Bureau of Industry and Security. (2024, December 2). Commerce strengthens export controls to restrict China’s capability to produce advanced semiconductors for military applications. U.S. Department of Commerce. Bureau of Industry and Security

Chen, Y., Pan, X., Li, Y., Ding, B., & Zhou, J. (2024). Simple and provable scaling laws for the test-time compute of large language models. arXiv. arXiv

C.H. Robinson. (2024, May 1). C.H. Robinson reports first quarter 2024 resultsC.H. Robinson

FreightWaves. (2024). C.H. Robinson uses gen AI to automate spot quotes, cutting response times dramaticallyNVIDIA

Hoffmann, J., et al. (2022). Training compute-optimal large language models. arXiv. karpathy

International Energy Agency. (2024). Energy and AI: Energy demand from AIIEA

International Energy Agency. (2024). Energy and AI: Executive summaryIEA

Kaplan, J., et al. (2020). Scaling laws for neural language models. arXiv. X (formerly Twitter)

Karpathy, A. (2024). Public commentary on “verify vs specify” framing (original post)X (formerly Twitter)

NASA Electronic Parts and Packaging Program. (2011). Radiation effects on electronics 101 (NASA NEPP). NASA Electronic Parts

NVIDIA. (2024, March 18). NVIDIA Blackwell platform arrives to power a new era of computing.

NVIDIA. (2024). NVIDIA GB200 NVL72 rack-scale systems arrive … (cooling capacity requirements and deployment considerations).

OpenReview (ICLR). (2025). Snell, C. V., Lee, J., Xu, K., & Kumar, A. Scaling LLM test-time compute optimally can be more effective than scaling model parametersOpenReview

The ARC Prize. (2024). ARC-AGI benchmark reporting for OpenAI o1 (public score context)ARC Prize

U.S. Geological Survey. (2017). Rare-earth elementsUSGS

U.S. Geological Survey. (n.d.). Rare earths statistics and informationUSGS

Reuters. (2025). Google announces Ironwood TPU (TPU v7) and positions inference scaling strategy.

Reuters. (2024–2025). Reporting on Nvidia Blackwell ramp and thermal/manufacturing issues.

NTU / Nature Electronics (paper). (Year). Carbon-neutral data centres in outer space (research discussion).

Comments