What is ITIL Problem Management?
Whereas Incident Management is concerned with getting the Customer up and running again as quickly as possible, Problem Management deals with getting to the root cause of the issues causing disruption.
A problem is defined as: “The unknown cause of one or more Incidents”
The purpose of Problem Management is to manage the lifecycle of all problems from first identification through further investigation, documentation and eventual removal; to minimize the adverse impact of incidents and problems on the business that are caused by underlying errors within the IT infrastructure; to pro-actively prevent recurrence of incidents related to these errors, and to get to the root cause of incidents, document and communicate known errors and initiate actions to improve or correct the situation.
Problem Management includes the activities required to diagnose the root cause of incidents and to determine the resolution to those problems. It is also responsible for ensuring that the resolution is implemented through the appropriate control procedures; especially Change Management and Release and Deployment Management.
The Problem Management Process
Problem Management will also maintain information about problems and the appropriate workarounds and resolutions, so that the organization is able to reduce the number and impact of incidents over time. In this respect, Problem Management has a strong interface with Knowledge Management, and tools such as the KEDB will be used for both.
Although incident and Problem Management are separate processes, they are closely related and will typically use the same tools, and may use similar categorization, impact and priority coding systems. This will ensure effective communication when dealing with related incidents and problems.
The Problem Management process has both reactive and proactive aspects:
- While reactive Problem Management activities are performed in reaction to specific incident situations, proactive Problem Management activities take place as ongoing activities targeted to improve the overall availability and end user satisfaction with IT services. Examples of proactive Problem Management activities might include conducting periodic scheduled reviews of incident records to find patterns and trends in reported symptoms that may indicate the presence of underlying errors in the infrastructure.
- Conducting major incident reviews where review of ‘How can we prevent the recurrence?’ can provide identification of an underlying cause or error.
- Conducting periodic scheduled reviews of operational logs and maintenance records identifying patterns and trends of activities that may indicate an underlying problem might exist.
- Conducting periodic scheduled reviews of event logs targeting patterns and trends of warning and exception events that may indicate the presence of an underlying problem.
- Conducting brainstorming sessions to identify trends that could indicate the existence of underlying problems.
- Using check sheets to pro-actively collect data on service or operational quality issues that may help to detect underlying problems.
Reactive and proactive Problem Management activities are generally conducted within the scope of Service Operation. A close relationship exists between proactive Problem Management activities and Continual Service Improvement lifecycle activities that directly support identifying and implementing service improvements. Proactive Problem Management supports those activities through trending analysis and the targeting of preventive action. Identified problems from these activities will become input to the CSI register used to record and manage improvement opportunities.