Branimir Valentic
April 5, 2016
How many times have you heard: “It happened again”? In that moment, everyone (and particularly management) on the IT Service Management (ITSM) team wished that the real cause of the incident had been found and eliminated, which is, basically, Problem Management. Incidents occur, and Incident Management should reinstate the user’s service as soon as possible. That usually doesn’t include finding a root cause – that’s the job of Problem Management.
Both ITIL and ISO 20000 require the root cause to be found and resolved. Easier said than done. Imagine a “frozen” PC. Restarting will get you the service (using a PC) back. That’s a workaround. And you can close the incident. But – what happened; what caused that PC to be stuck? That’s another issue and should be considered separately – that’s the job of Problem Management (read the articles ITIL and ISO 20000 Problem Management – Organizing for problem resolution and ITIL Problem Management: Getting rid of problems to learn the basics of the Problem Management process).
Basically, problem resolution looks (from a process point of view) similar to incident resolution. After opening a problem ticket, i.e., record, there is categorization, prioritization, investigation, and diagnosis. Closing the end of the process, there are several alternatives. Let me explain each one of them.
When resolving problems, often you will get to the point where you have an indirect solution. This means you have found what to do to get service back, but you still didn’t figure out what really caused this incident; i.e., you have a temporary fix. That’s your workaround. Here are a few characteristics of a workaround:
According to the ITIL definition, a Known Error is “a problem with a documented root cause and workaround.” Please read the article: Known Errors – repetitio est mater studiorum? Not in this case to learn more about Known Errors. It is important to remember the following:
Finding a root cause is not simple, because it can be complex from a technical point of view. It could happen that you need to involve developers, check source code, etc. And that requires expertise and time. Although costly (from a time and resources point of view), problem resolution is beneficial because it saves future hurdles and loss of services due to the fact that once you have a permanent resolution – those same incidents will not happen again. Note also that this (number of repeated incidents) is one of your Key Performance Indicators (KPIs) for Problem Management process efficiency.
Figure: Activities at the end of the Problem Management process
But, that’s not the end. Once you know the root cause of one or more incidents (and related problem record) – something must be done. History has shown that this is the moment when Change Management “enters the stage.” Namely, the root cause has to be eliminated. That usually involves a change in the scope of your service, e.g., reconfiguration, change in code, etc. So, a Request for Change (RfC), or even an Emergency RfC should be raised. While the change is being implemented, the KEDB (and a workaround) is used if more of these incidents occur. The KEDB has one more important role – sometimes it’s not possible (e.g., technically too complex or too expensive) to implement a final resolution. That’s when the information in the KEDB will be used once the same incident occurs in the future.
Finally, once the final resolution is implemented, the problem is closed. Depending on the quality of the implemented solution, the same incidents should not occur anymore. In such way, ITSM team will free resources (particularly the Problem Management team) for new incident/problem tickets. That will increase your team’s efficiency and, consequently, the satisfaction of your own people (not doing the same job over and over again) as well as your customers. What more could you wish for?
To see how you Problem Management process complies with ISO 20000 requirements, i.e., ITIL recommendation, use our free ITIL and ISO 20000 Tools.