ISO-20000-ITIL-blog

ISO 20000 & ITIL® Blog

IT Service Continuity Management – waiting for the big one

Rarely do I find ITIL to be identified with service continuity. Considering the extent of ITIL – it’s understandable. As the name implies, something shouldn’t be stopped or broken in continuity, as most likely it has consequences. What are they, to what extent, and why?

It’s difficult to avoid IT Service Continuity

Consider processes inside your company…or services that you use or your company provides. Is it possible that they exist without information technology? Probably not. This means if (supporting) IT services fail, the same will happen with business processes and respective services. Therefore, continuity of business services is highly dependent upon continuity of IT services. There is some logic behind it, isn’t there?



This points to two things:

  1. Continuity of IT services cannot be neglected – whenever I talk to admins, they never have any doubts with backup. To the contrary, they always have some kind of solution and they don’t see it as an open issue. This means they already took some steps with regards to IT service continuity – quite important ones, I would say.
  2. IT service continuity is coupled with business continuity – according to ITIL, the purpose of the IT Service Continuity Management (ITSCM) process is to support overall Business Continuity Management (BCM). In the real world, I have found quite often that ITSCM (fully or to some extent) is already a “teenager,” while BCM is still in the “embryo” phase.

ITIL supports the importance of (IT) service continuity and therefore dedicates one of its processes inside Service Design phase of the service lifecycle – IT Service Continuity Management.

ITSCM_and_BCM_are_running_togetherFigure: ITSCM and BCM are running together

What is ITSCM all about?

In essence, it’s about risk and recovery. It sounds complicated – but it’s not.

Risk means that there are certain threats to which IT services are exposed whose impact needs to be reduced to an (agreed) acceptable level (although, what I see is that SLAs rarely have IT service continuity parameters inside them). The distinction has to be made here between situations where risks are significant and have major impact on the business, and risks that are minor technical faults and should be treated by the Incident Management process.

Recovery means that plans and preparation for recovery have to be available. A common situation is that backup of critical data exists and could be restored if needed. In more complex environments, or I would say where business processes are heavily dependent on IT services (e.g. banks) this means that there is a secondary location in place with an alternative datacenter, and all data are mirrored and available immediately upon IT service continuity plan invocation.

Where does it come from?

A critical moment in ITSCM is to define requirements. This should ensure that business requirements are understood and that the impact of loss of IT services on the business is clear and quantified. Quantified means that financial loss can be calculated, or some other form of intangible consequence can be defined, like loss of competitive advantage or disrupted image.

Business Impact Analysis (BIA) and Risk Assessment are used to define requirements.

BIA – BIA quantifies the impact on the business that the loss of IT service would have. It may not be done orderly and documented, but I see that most of the IT service providers do some kind of BIA. Except (tangible and intangible) loss, BIA identifies staff and their skills that are needed to enable critical business processes to run at an acceptable (usually degraded) level, time when minimum, as well as all services should be recovered and  priority for the recovery of the services. Sometimes it is not possible or necessary that complete service is recovered. For example, for a web-trading company it is essential that the web-shop is available and functional a.s.a.p., but invoicing can be established at some later moment (of course, not too late, but it is feasible that invoices are sent in the next day or two). Therefore, BIA defines services and their recovery options, as well as the full recovery timescale.

Risk Assessment – assessment of level of threat and the extent to which an organization is vulnerable to that threat. There are many risk management and assessment techniques. In general, risk assessment results in defined responses to certain risks, and risk reduction measures that should reduce risk to an acceptable level or mitigate the risk. In praxis – you know that a network (as a service, but also as a group of components) is vulnerable to physical threats (e.g. fire, earthquake, flood, power failure), but also to technology failure, denial of service attack… etc. For those threats, risk reduction or mitigation measures should be defined (e.g. power failure can be mitigated by implementing an uninterruptable power system – UPS).

What to do with IT Service Continuity?

After requirements are defined, ITSCM plan should be developed and implemented. This is an ongoing process and should be integrated with business continuity plans.

When you have a plan and ongoing operational services, it is important that you test the plan and that you know exactly what to do in case the IT service continuity plan is invoked. Once I had a situation where recovery procedures were defined, but never tested. When a disastrous situation took place everything was clear – and unsatisfactory. After recovery was complete, the next project was a redesign of recovery plans and procedures. And – test, test, test…

Download a free sample of our IT Service Continuity Management process template and related appendices to get a deeper understanding of the ITSCM concept.

Advisera Branimir Valentic
Author
Branimir Valentic
Branimir is an expert in IT service management (consultancy, training and tools), IT governance (training and consulting), project management and consultancy in IT and telecommunication. He holds the following certificates: ITIL Expert, ISO 20000, ISMS Lead Auditor and PRINCE2.