The recent fire event at OVH cloud, in Strasbourg, France and another similar incident in the US is a strong reminder of the risks which remain present in operating data centre facilities and, importantly, what this could mean for operational staff, business reputation and client services.
Fire events aren’t commonplace for data centres given the high levels of monitoring, through High Sensitivity Smoke Detection (HSSD), and suppression or extinguishing systems, so when they do occur the level of attention is understandably high. From reports, it’s great to hear that nobody was seriously injured or worse and reminds us that the management of safety, health, and the environment from construction through to operational management is paramount.
The transparency demonstrated by OVH founder and chairman Octave Klaba, through his online posts and video, has been refreshing and a different approach from the norm. He has touched upon the potential cause of the fire being related to an Uninterruptible Power Supply (UPS) unit that had been serviced that morning. This raises a number of questions:
Could this have been a result of an engineer action – which would support the statistics that human errors account for around 25% of downtime?
- Could this have been a technical fault which triggered a battery fire in which protection systems were unable to deal with, if they were in fact in place?
- Or is it something entirely unrelated?
Regardless of the cause, the event further highlights that disasters can happen, whether this be flooding, preventing sites from being able to continue operating, extreme weather which impacts critical power and cooling systems or fires which can have irreparable damage.
It reinforces the importance of a good disaster recovery plan from both the operators or cloud service providers and their customers. 100% service availability is an expected standard today but putting this in place for some requires comprehensive planning and can have both technical and commercial implications which need to be considered in order for it to be effective. It’s also the customer’s responsibility to evaluate the Threat, Risk and Vulnerability of a service and/or understand the standards data centres are built and operated to, through audit against standards such as EN50600, European Data Centre standard.
Finally, given the exponential increase in facilities built in the early noughties, the core infrastructure reaches end of life (10 - 20 Years) and the capital investment to replace or upgrade remains high, will we see more events like this, and what will this mean for the industry?