by Sanjoy Maity, Chief Executive Officer
The AI infrastructure buildout is one of the most consequential engineering undertakings in modern history. Hundreds of billions of dollars flow into data center buildout every year. GPU clusters drawing power equivalent to small cities are assembled at breathtaking speed. And underneath all of it, managing the silicon, governing the thermal and cooling, keeping the lights on when everything else goes dark, sits firmware. The control plane that most of our industry has historically treated as an afterthought.
That has to change. And the reason it has to change is not based on one problem. Its foundation lies in five.
In Greek mythology, the Hydra was a creature that made defeating it nearly impossible: cut off one head and two grew back in its place. That image is the most honest way I know how to describe what AI data center operators are up against right now. These are not five isolated technical challenges waiting to be solved in sequence. They are five interconnected forces, each one reinforcing the others, each one growing more demanding as AI infrastructure scales. You cannot fix fragmentation without also confronting security and resilience. You cannot solve power and thermal without rethinking the control plane. Every head is connected to the body. And the body is firmware.
Head One: The Fragmentation Crisis
We live in a heterogeneous world, and it is only getting more complex. GPUs, DPUs, NICs, and a new generation of purpose-built accelerators arrive into production environments with every deployment. Each brings its own management model, its own behavior, its own logic for how it expects to be governed. The result is a patchwork of control surfaces that operators are expected to stitch into something coherent and reliable. Without a consistent firmware foundation underneath it all, calling this “heterogeneous infrastructure” is generous. It is fragmentation, dressed up as progress, created by the performance demands that require infrastructure innovation to fuel AI.
Head Two: The Security Imperative
Firmware doesn’t just boot a server. In a multi-tenant AI data center, it anchors secure boot sequences, governs update integrity, enables attestation, and manages the lifecycle of every node in the fleet. Security pressure is rising because the attack surface is rising with it. When firmware is inconsistent or unverified across a mixed fleet, the exposure is enormous and largely invisible to the people responsible for defending it. Operators rarely discover how significant the gap was until the moment it matters most. In a world where AI workloads carry both extraordinary commercial value and extraordinary sensitivity, that is not an acceptable risk posture.
Head Three: The Power and Thermal Reckoning
The static and predictable power baselines that data center teams relied on for years are gone. AI workloads create dynamic, high-frequency power and thermal fluctuations that demand real-time, hardware-rooted control spanning both IT and operational technology infrastructure. Challenges like VDROOP place pressure on power delivery and risk the underlying reliability of expensive infrastructure if not managed acutely. A software dashboard chasing these problems from above the hardware layer cannot respond with the acute control required. The management plane needs to close the loop at the source, with integrated control that lives where the power delivery, conversion, and consumption actually happen.
Head Four: Fleet Scale Has Become Table Stakes
A single AI cluster can span thousands of nodes. Operators need consistent provisioning, predictable API behavior, and the ability to detect and remediate configuration drift across that entire fleet without acts of heroism from their teams. Agentic computing models will accelerate this demand further. Autonomous systems managing infrastructure expect a programmable, consistent control surface. Firmware that behaves “mostly consistent in most places” is not a foundation. It is a liability with a timer on it.
Head Five: The Control Plane Is Now Mission Critical
Out-of-band access, telemetry, inventory, and recovery cannot be best-effort capabilities any longer. When a host stack goes down or gets compromised – and it will – operators must be able to see and reach every node in the fleet. With AI infrastructure costs skyrocketing, downtime is not an inconvenience measured in SLA credits. It is a catastrophic event measured in millions of dollars per hour. The control plane has to be as reliable as the workloads it supports.
Taken individually, each of these challenges is solvable. Taken together, they form something far more formidable. Cut one head and two more demand your attention. That is the reality facing every operator, every hyperscaler, every enterprise IT leader running AI infrastructure at scale today.
For too long, the industry’s answer to this hydra has been proprietary tools and closed ecosystems, point solutions layered on top of point solutions, each one promising to tame the chaos while quietly adding to it. We believe there is a better path: a unified, open foundation that the industry can build on, contribute to, and trust at scale, based on the work of OpenBMC and the Open Compute Project, and extending to broad industry collaborations to solve this with engineering from across the value chain.
AMI has been building firmware for over three decades. We have watched the data center evolve through every major architectural shift. We are investing in this important moment, scaling our open-source contributions, growing our foundational technology partnerships to identify root causes to these challenges, and beginning to align leaders in their respective arenas to a joint commitment to solving them. The control plane has become the most critical surface for the most valuable infrastructure on the planet, and the status quo is not built to hold.
The industry is ready for something new. We are ready to deliver it.
