ITIL Problem Management: getting rid of problems
Albert Einstein: “You can never solve a problem on the level on which it was created.”
Problem Management (PM) is one of the oldest processes in ITIL, and one of the processes implementers have a lot of problems with (notice the pun?). It was significantly rewritten in the 2011 edition, in spite of its age.
Problem vs. Incident Management
I am used to explaining this quite a lot to new people in ITIL. It comes naturally to a person outside of the IT Service Management world to say “Problem,” when they have an incident. Remember “Houston, we have a problem”? Even one of the major Service Desk tool vendors keeps the Incident database under internal name “Problem,” due to legacy reasons.
By definition, an Incident is the “unplanned interruption to an IT service or reduction in the quality of an IT service or a failure of a CI (Configuration Item) that has not yet impacted an IT service.” (ITIL 2011)
A Problem, on the other hand, is “the underlying cause of one or more incidents.” (ITIL 2011)
The Incident Management process aims to restore the service to the customer ASAP. Problem Management works patiently and analytically to find the underlying cause of incidents, to create proper workarounds and permanent fixes. Obviously, these two call for different kinds of people. Employees working in Incident Management are engineers who often rose from the Service Desk. Problem Management people are experienced experts who have a lot of time on their hands to peacefully analyze incident trends and spot problems reactively and proactively. And that is the main problem with Problem Management – those people don’t exist. These are expensive people usually working on important projects up to their necks, often members of middle and higher management. Recruiting them for activities in Problem Management can prove to be an extremely difficult task. This is an example where management support can be crucial.
Value to business
Problem Management captures underlying causes of incidents and helps the Support staff to resolve them more quickly by being informed on known issues, workarounds and fixes. Services are thus more available, and the costs of Incident Management are reduced.
Repeating incidents are solved more quickly, and major incidents are addressed in a proper way, providing timely information to the customer and assuring him/her that their services are in safe hands.
It is less important here to focus on the tool than the process. The tool for Problem Management can be a sophisticated one, and for enabling complex relationships to related incidents, known error/knowledge databases and resulting changes, it surely helps. Resolving a series of related incidents when a resulting change is completed is a handsome and welcome feature. But a simple ticketing system where Problem or Incident is just a type of the ticket, with the same prioritization and categories, will be just fine. What I’m saying is that we should be focusing more on allocating competent staff for a specific type of problem. Bells and whistles tend to be appealing to IT people, but at the end of the day, the job gets done by people, not fancy features.
Reactive Problem Management
Reactive PM deals mostly with two types:
- Repetitive Incidents – Usually, the Service Desk can trigger a Problem record, notifying the Problem Manager of a highly visible repetitive pattern of some incidents. The Problem Manager initiates the Problem Management process.
- Major Incidents – high impact/urgency incidents require immediate engagement of a competent problem management team, working on a Root Cause and providing the Service Desk people with timely info for customer reporting.
Proactive Problem Management
The best way to start proactive Problem Management would be by scheduling short periodical (weekly, monthly) PM meetings with the expert staff, where preliminary trend analyses are done and requirements for further analysis and reporting is assigned and scheduled. This way, proactive Problem Management is introduced gradually, and even if it shows to be of low efficiency in the beginning, it sets a pattern for future acceptance by all the stakeholders.
Interfaces to other ITIL processes
Interfaces to other problems are mentioned in ITIL Operation chapter 188.8.131.52. I will mention the ones I find the most relevant:
Financial Management: Analyzing costs of SLA penalties as opposed to costs of adequate equipment/resources can lead to more efficient cost management. Not being able to provide timely equipment replacement or not having enough staff to address the incidents can prove to be both costlier and more frustrating for support employees. This influences both Availability and Capacity Management, and consequently, Service Level Management.
Change Management: Problem Management often results in a Request for Change (RFC) demanding a fix to a known error. Change Management informs Problem Management about new workarounds and fixes.
Service Asset and Configuration Management: Problem Management benefits from the accurate data about Configuration Items (CI) during trend analysis and impact evaluation.
Knowledge Management: A Service Knowledge Management System (SKMS) is a natural place for storing information about known errors, workarounds and fixes. Incident Management is significantly sped up when this info is available in a meaningful GUI.
Release and Deployment Management deploys resulting permanent fixes. Errors introduced during patching are monitored by the Service Desk, and resulting problems are resolved again by Problem Management.
After the implementation, PM will slowly but steadily start to minimize the number and impact of repeating incidents. In the second phase it will minimize the impact of incidents that can’t be prevented.
By providing a sense of quality and professionalism during its activities, Problem Management can greatly help to create and maintain customers’ confidence in IT Service Management.
To learn more about the Incident and Problem Management processes, download a free preview of our templates.