When a ‘record’ isn’t a record at all...

Published on 8th January 2015 by Ian Bitterlin

I read with interest a press release the other day by an Australian internet service provider (ISP) that was forced to shut down critical systems at its Perth data-centre because of record-breaking temperatures in the city, causing disruptions to customers across Australia.  In the statement they said that the temperature last Monday reached 44.4⁰C.

I’ve some good friends in that lovely city but my first thoughts focused on the facilities cooling system (proving what a data-centre nerd I really am) and its inability to cope. Surely, I thought to myself, the facility will be like most other ISPs and not be running at full kW capacity?  The heat rejection plant should be able to cope at 60-70 percent load if the temperature had crept a degree or two above the design point.

Dateline London
Of course I was just speculating in a vacuum of data but I considered this in comparison with our London situation; a record 34.4⁰ that occurred for one hour. We usually design for 35⁰C which means that all standard heat rejection plant can be applied without de-rating, dry-coolers and standby gensets included.  Maybe the facility already used water for evaporative or adiabatic cooling but, if not, then just taking a hose and lawn sprinkler to the roof could have increased cooling capacity by several tens of percent, I mused...

Of course, the solution to high ambient temperature is simple, all you need is bigger coils and I started to speculate about climate change and our ‘next’ record in the UK – grateful that it will be (hopefully) several decades before we reach 44.4⁰C and we will be designing for 50⁰C.

Then the last sentence of the press release struck me.  This ‘record’ was in fact the city’s sixth hottest day on record. I’m not sure what sort of a record you set for 6th place and I started to wonder about what the peak design temperature was?  Hopefully it was not less than the real record?  Or was the failure not due to temperature directly but down to a cooling failure in the plant when under load?  Maybe it wasn’t the IT cooling plant but air-cooled UPS?

I don’t suppose we shall ever know the facts but, bearing in mind the issue of partial load which is endemic in data centers around the globe, I suspect that it was more likely a plant failure than a capacity shortcoming.

Human error?
Of course human error is responsible for more than 70 percent* of all service failures so maybe it was a mistake in budget setting by the client, design error by the consultant, optimistic climate predictions or erroneous finger-intervention by operatives?  Either way it was a blinking hot Monday in Perth but this sort of ‘less than extreme’ failure should not occur.

* Courtesy Uptime Institute, although I heard a Microsoft guy talking about their 40+ US data centers and reporting that if you combine human error with software failure then you are only left with 3 percent down to the facility!

CONNECT WITH US

Sign in


Forgotten Password?

Create MyDCD account

Regions

region LATAM y España North America Europe Em Português Middle East Africa Asia Pacific

Blogger

 

Prof. Ian Bitterlin is the Chief Technology Officer for Emerson Network Power – the world leader in data-centre power and cooling infrastructure solutions and integrated DCIM software. Recognized in the industry as an expert mechanical and electrical engineer, Ian has produced numerous wh ... More

Whitepapers View All