What Enterprises Need: Reliability

August 10, 2011

2 minute read

What Enterprises NeedBeen on a flight lately? How important is it to you that the engine NOT stop operation - ever?

Believe it or not there are many examples in IT where the services and applications deployed must also never stop operating. The term High Availability (HA) refers to the % of time a given service or application is up, running and available for use.

There are varying degrees of HA - the more extreme levels allow for only 31.5 seconds of downtime per year! The highest levels of HA are reserved for applications critical to a company's operation. So critical, in fact, that if they stop - even temporarily (aside from scheduled maintenance) - there can be serious implications for the business including some very unhappy users.

It's hard to imagine what could be so important that a given service is deemed to require 99.999999% uptime but certain industries and certain processes demand a truly HA environment. These include life sciences companies whose expiring patents mean that every minute of downtime has real-world impact on the bottom line. Some manufacturing companies can even be forced to shut down production if critical content flows are stopped even for a short period of time. Downtime that can be measured in terms of $100,000s of dollars/minute and unhappy customers. Service Level Agreements (SLAs) are developed so that IT knows what level of uptime the users and applications expects - and meeting those SLAs is very important.

Delivering a truly HA solution is not easy, though. It's not just a matter of code quality, stability and robustness - you need to consider external factors like hardware failure or even environmental disasters. It's not even about load balancing that just helps distribute the work but does not guarantee that no work is lost. It's about deploying a solution that can be architected to eliminate all single points of failure. If a component goes down, another immediately picks up where it left off - and any work the failed component was working on is re-deployed.

Jets often have multiple engines, not just to balance the work but as redundant power sources should one ever quit. Come to think of it, the single-engine Cessna have just been dropped from my list of acceptable transportation. Failure at that level would seriously diminish my user satisfaction.

Don’t forget to share this post