How to Prevent Memory Failures in Your Data Center

How to Prevent Memory Failures in Your Data Center

Dos Terasaka

Aptio Product Manager, Global Product Group

This blog post will discuss preventing memory failures in your data center and maintaining RAS.

Cloud data center managers have their hands full dealing with various hardware failures that can impact service availability and revenue. Unfortunately, as data center operators know all too well, memory failures are one of the top hardware failures. Unlike some other hardware failures, a memory failure can have a devastating effect without giving an early enough warning of a future outage to take preemptive action.

By using machine learning to analyze real-time memory health data, it is possible to predict such failures ahead of time. Machine learning helps to find hidden patterns and insights in data sets to predict future events. So, by applying machine learning to memory health data, it is possible to detect issues early on and predict when a failure is likely to occur. This gives data center operators the time they need to act and prevent an outage from occurring. And that, in turn, leads to better uptime for data center operations.

Intel’s Memory Resilience Technology predicts these failures before they happen, using pattern matching based on historical data. It uses a multi-dimensional model and algorithms to predict when a memory is likely to fail. Memory Resilience Technology is a core technology that every data center and cloud service provider should utilize to reduce total cost of ownership and improve system uptime. This ultimately results in improved data center SLAs, reduced memory failure rates and proactive memory health evaluation.

When it comes to tracking and analyzing memory errors, you need a BIOS that can work closely with your BMC firmware. That’s where the AMI solution comes in. AMI’s Aptio UEFI captures errors and passes the relevant data to our MegaRAC BMC firmware. AMI’s MegaRAC then uses Intel’s Memory Resilience Technology engine to calculate a health score for the affected memory module. This way, AMI’s technology tracks each memory module’s health over time and exposes the results for the data center operator to review.

So, what are you waiting for? With Memory Resiliency Technology we’ve got you covered whether you’re dealing with a few isolated errors or a full-blown memory crisis.

Resources

Memory Resilience Technology overview video co-presented by Intel and AMI (English, Mandarin, Japanese)
Data Center cost savings calculator using Memory Resilience Technology
AMI Firmware Solutions for Intel Memory Resilience Technology Data Sheet

About AMI

AMI is Firmware Reimagined for modern computing. As a global leader in Dynamic Firmware for security, orchestration, and manageability solutions, AMI enables the world’s compute platforms from on-premises to the cloud to the edge. AMI’s industry-leading foundational technology and unwavering customer support have generated lasting partnerships and spurred innovation for some of the most prominent brands in the high-tech industry.

Aptio® and MegaRAC® OpenEditions, Available on GitHub Today

AMI TruE® enabling Confidential Computing

AMI FirST Builds a Platform of Trust

Zero Trust Security Starts with Firmware

Empower Your Arm Devices With AMI Foundational Firmware

Recent Posts

How to Prevent Memory Failures in Your Data Center

How to Prevent Memory Failures in Your Data Center

Dos Terasaka

Resources

About AMI

You May Also Like…

Sustaining Data Centers of the Future: AMI at OCP Regional Summit 2024

AMI’s Technological Leadership at Embedded World 2024

Enabling Arm-based Platforms with Windows IoT: A Collaboration Between AMI, Microsoft, SECO, and NXP

SOLUTIONS

Products

Company

Contact

None

Solutions

Products

Company

Contact