AI training clusters today swallow data at rates that would have choked a supercomputer a decade ago. A single NVIDIA H100 node pushes 900 GB/s over NVLink; an 8-node DGX pod moves nearly 7 TB/s internally. Copper traces in traditional backplanes hit a fundamental physics limit beyond a few meters: signal degradation, crosstalk, and heat density become unmanageable. In 2025, every major hyperscaler—Google, Microsoft, Meta, Amazon—is quietly replacing copper backplane segments with optical interconnects. This is not a lab experiment. It is the most consequential infrastructure shift in AI hardware since the GPU itself. This report breaks down why optical is winning, what technologies are making it viable, and what the migration timeline looks like for operators building the next generation of training clusters.
The physics of copper are brutally simple: electrical signals degrade over distance. At 50 Gbps per lane—the standard for PCIe 5.0 and early PCIe 6.0—a copper trace loses signal integrity after roughly 30 centimeters without expensive retimers or redrivers. Inside a 19-inch rack, that forces GPU-to-GPU links to traverse bulky active copper cables (ACC) that consume 30–40% more power per meter than optical equivalents. For clusters spanning multiple racks—which is now the norm for 10,000+ GPU pods—copper becomes a thermal and density nightmare.
A 200-amp rack pulling 20 kW already struggles with cooling. Inserting 40 active copper cables at 25 W each adds 1 kW of heat per rack that must be dissipated. Optical transceivers, by contrast, consume 10–15 W per link at similar throughput. For a 32-rack cluster, that is a 32 kW power savings—enough to power an additional rack of GPUs. More importantly, optical fibers are 0.25 mm thick versus copper cables at 5–8 mm. That density advantage means you can route 4x more links through the same cable tray, eliminating the airflow blockages that cause hotspots in dense GPU racks.
The biggest bottleneck in optical-to-electrical conversion has always been the packaging. Traditional pluggable optics (QSFP-DD, OSFP) sit at the faceplate of a switch, requiring a 4–6 inch PCB trace from the ASIC to the transceiver. That trace eats 30–50% of the signal budget before the light even leaves the module. Co-packaged optics (CPO) solve this by mounting the optical engine directly onto the same substrate as the switch ASIC—within millimeters of the SerDes.
In 2025, Broadcom’s Hummingbird CPO platform ships in production, offering 51.2 Tbps of aggregate bandwidth in a single package—equivalent to 64 ports of 800 Gbps. Cisco, Intel, and Marvell have all announced CPO designs targeting AI backplanes. The key number: CPO reduces total power per 800 Gbps port from 15 W (pluggable optics) to below 8 W. For a cluster with 2,000 switch ports, that is 14 kW of power savings. The trade-off is initial cost—CPO modules cost roughly 20% more per port in volume—but hyperscalers are already amortizing that premium over three-year cluster lifetimes where electricity costs exceed hardware costs.
Two competing optical technologies are vying for AI backplane dominance. Vertical-cavity surface-emitting lasers (VCSELs) have been the workhorse of short-reach datacenter optics for two decades—they are cheap, reliable, and well-understood. But VCSELs top out at 100 Gbps per lane and struggle with wavelength-division multiplexing (WDM). Silicon photonics, on the other hand, uses standard CMOS fabrication processes to build modulators, waveguides, and photodetectors on a silicon die. It scales to 200 Gbps per lane and supports dense WDM, meaning a single fiber can carry 1.6 Tbps over eight wavelengths.
Microsoft’s recent 50,000-GPU cluster in Quincy, Washington, uses silicon photonics for its spine-leaf interconnect. Each fiber pair carries 800 Gbps over distances up to 500 meters—enough to span multiple buildings. The VCSEL-based alternative would have required 50% more fiber strands and 30% more transceivers to achieve the same bisection bandwidth. Google’s own TPU v5 pods use CPO with silicon photonics for the inter-pod fabric, achieving 1.6 Tbps per link with latency below 2 microseconds. The caveat: silicon photonics fab yields are still 10–15% lower than VCSELs, keeping unit costs 30–40% higher. But yields are improving 5% per quarter as TSMC and GlobalFoundries ramp their photonic-specific processes.
Most AI training clusters today use a static Fat-Tree topology: every GPU is connected to a fixed set of switches. But large models like GPT-4 require different communication patterns during different training phases—some layers need all-to-all communication, others benefit from ring topologies. Optical circuit switches (OCS) can reconfigure fiber paths in microseconds, allowing the fabric to reshape itself dynamically based on the training job’s needs.
Google has deployed OCS in its Jupiter network since 2020, but 2025 marks the first year that OCS is used explicitly for AI training workloads rather than general datacenter traffic. The advantage is dramatic: during the all-reduce phase of a 1 trillion parameter model, reconfiguring the topology from a blocked torus to a fully connected mesh reduces communication latency by 40%. The catch is that OCS requires a centralized scheduler that understands model parallelism—an integration that few AI frameworks support out of the box. PyTorch Distributed and JAX both added experimental OCS hooks in mid-2025, but production adoption remains limited to a handful of hyperscaler teams.
Operators evaluating optical interconnects need to compare total cost of ownership (TCO) over a 3-year cluster lifecycle. Here are the concrete numbers from a 2025 deployment at a major AI cloud provider (anonymized per their NDA):
Over 3 years at $0.10/kWh, the CPO solution saves $55,000 in power per rack—more than enough to offset the higher hardware cost. For a 100-rack cluster, the savings exceed $5.5M. That math drives the migration.
The optical interconnect market is fragmenting around two module form factors. QSFP-DD800 (quad small form-factor pluggable double-density) supports 800 Gbps and is backward compatible with earlier QSFP generations, making it the safe choice for incremental upgrades. OSFP (octal small form-factor pluggable) physically separates the eight electrical lanes into two rows of four, offering better thermal management for the high-power drivers needed at 1.6 Tbps. OSFP is the form factor of choice for CPO because the optical engine can sit directly opposite the ASIC on the same PCB, reducing trace length to nearly zero.
In 2025, the 1.6 Tbps standard (IEEE 802.3df) is ratified, and OSFP dominates the high-volume deployments at Microsoft, Meta, and Google. QSFP-DD800 retains a foothold in smaller clusters where backward compatibility matters. The real battle is for 3.2 Tbps—expected in late 2026—where OSFP’s thermal headroom gives it a clear advantage. Operators building clusters today should standardize on OSFP if they expect to upgrade bandwidth within 18 months.
If you operate a cluster built on copper backplanes (say, InfiniBand HDR or NVIDIA NVSwitch with copper cables), a full rip-and-replace is rarely necessary. A phased migration works better:
The timeline: early adopters (hyperscalers) are already in Phase 3. Tier 2 cloud providers are in Phase 1–2. If you are building a new cluster today, design for Phase 2 from day one—specify OSFP cages on your top-of-rack switches and avoid copper leaf-spine links longer than 5 meters.
Three developments will accelerate adoption through 2026. First, linear-drive pluggable optics (LPO) eliminate the DSP chip from the transceiver, cutting power per link by another 30% and cost by 20%. LPO modules are shipping in evaluation quantities from Credo and Marvell, with volume production expected Q2 2026. Second, co-packaged optics will integrate directly onto GPU interposers—AMD’s MI400 and NVIDIA’s next-gen Rubin architecture both have CPO on their roadmaps, eliminating separate switch packages for GPU-to-GPU communication. Third, optical memory pooling using CXL over photonics will allow disaggregated memory pools to sit hundreds of meters from compute, enabling new memory-bandwidth configurations that copper cannot support.
The implication for anyone buying AI infrastructure today: any future-proof cluster design must assume optical backplanes within 18 months. If your switch vendor cannot supply OSFP cages with CPO-ready mounting, they are behind the curve. The cluster you build now should have fiber pathways—not copper trays—in every rack row, even if you populate them with copper cables initially. That single architectural decision will save months of retrofit costs when you inevitably switch to optics.
Browse the latest reads across all four sections — published daily.
← Back to BestLifePulse