For any given asset there are typically dozens of different predictive or preventive maintenance tasks that could be performed, however, selecting the right maintenance tasks that contribute effectively to your overall strategy can be tricky.
The benefit is the difference between meeting production targets and the alternative of lost revenue, late night callouts, and added stress from unplanned downtime events.
Step 1: Build out your FMEA (Failure Mode Effects Analysis) for the asset under consideration.
Make sure you get down to appropriate failure modes in enough detail so that the causes are understood and you can identify the proper maintenance to address each specific failure mode.
Once you’ve made a list of failure modes, then it’s detailed analysis time. If you want to be truly rigorous, perform the following analysis for every potential failure mode. Depending on the criticality of the asset you can simplify by paring down your list to include only the failure modes that are most frequent or result in significant downtime.
Step 2: Identify the consequences of each failure mode on your list.
Failure modes can result in multiple types of negative impact. Typically, these failure effects include production costs, safety risks, and environmental impacts. It is your job to identify the effects of each failure mode and quantify them in a manner that allows them to be reviewed against your business’s goals. Often when I am facilitating a maintenance optimization study people will say things like “There is no effect when that piece of equipment fails.” If that’s the case, why is that equipment there? All failures have effects, they may just be small or hard to quantify, perhaps because of available workarounds or maybe there is a certain amount of time after the failure before an effect is realized.
Step 3: Understand the failure rate for each particular mode.
Gather information on the failure rates from any available industry data and personnel with experience on the asset or a similar asset and installation, as well as any records of past failure events at your facility. This data can be used to evaluate the frequency of failure through a variety of methods — ranging from a simple Mean Time To Failure (MTTF) to a more in-depth review utilizing Weibull distributions.
(Note: The Weibull module of Isograph’s Availability Workbench™ can help you to quickly and easily understand the likelihood of different failure modes occurring.)
Step 4: Make a list of possible reactive, planned or inspection tasks to address each failure mode.
Usually, you start by listing the actions you take when that failure mode occurs (reactive maintenance). Then broaden your list to any potential preventive maintenance and/or inspection tasks that could help prevent the failure mode from happening, or reduce the frequency at which it occurs.
- Reactive tasks
- Replacement
- Repair
- Preventive tasks
- Daily routines (clean, adjust, lubricate)
- Periodic overhauls, refurbishments, etc.
- Planned replacement
- Inspection tasks
- Manual (sight, sound, touch)
- Condition monitoring (vibration, thermography, ultrasonics, x-ray and gamma ray)
Step 5: Gather details about each potential task.
In order to compare and contrast different tasks, you have to understand the requirements of each:
- What exactly does the task entail? (basic description)
- How long would the work take?
- How long would it take to start the work after shutdown/failure?
- Who would do the work?
- What labor costs are involved? (the hourly rates of the employees or outside contractors who would perform the task)
- Would any spare parts be required? If so, how much would they cost?
- Would you need to rent any specialized equipment? If so, how much would it cost?
- Do you have to take the equipment offline? If so, for how long?
- How often would you need to perform this task (frequency)?
A key consideration for inspection tasks only: What is the P-F interval for this failure mode? This is the window between the time you can detect a potential failure (P) and when it actually fails (F) — similar to calculating how long you can drive your car after the fuel light comes on, before you actually run out of fuel Understanding the P-F interval is key in determining the interval for each inspection task.
The P-F interval can vary from hours to years and is specific to the type of inspection, the specific failure mode and even the operating context of the machinery.
It can be hard to determine the P-F interval precisely but it is very important to ensure that the best approximation is made because of the impact it has on task selection and frequency.
Step 6: Evaluate the lifetime costs of different maintenance approaches.
Once you understand the cost and frequency of different failure modes, as well as the cost and frequency of various maintenance tasks to address them, you can model the overall lifetime costs of various options.
For example, say you have a failure mode with a moderate business impact — enough to affect production, but not nosedive your profits for the quarter. If that failure mode has a mean time between failures (MTBF) of six months, you might take a very aggressive maintenance approach. On the other hand, if that failure mode only happens once every ten years, your approach would be very different. “Run to Failure” is often a completely legitimate choice, but you need to understand and be able to justify that choice.
These calculations can be done manually, in spreadsheets or using specialized modeling software such as the RCMCost™ module of Isographs Availability Workbench™.
Ultimately you try to choose the least expensive maintenance task that provides the best overall business outcome.