International Journal of Performability Engineering, 2010, No 4

, No 4

Export Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Editorial

Select Editorial:Designing for Maintainability and Optimization of Maintenance Policy an important step of achieving Performability

Krishna B. Misra

2010, 6(4): 301. doi:10.23940/ijpe.10.4.p301.mag

Abstract

The next important attribute of achieving performability after quality, reliability (see Editorials Vol.6, No.1-3, 2010) is maintainability, which needs to be designed and built into any system or product and realized through maintenance during its operational phase. From BS 4778-3.1:1991 or BS 3811:1993 or MIL-STD-721B, maintenance is defined as a "process of maintaining an item in an operational state by either preventing its transition to a failed state or by restoring it to an operational state following its failure". Therefore, the primary aim of maintenance is to prolong the state of functioning by not allowing an item to deteriorate its condition.??? There are several approaches to maintenance based on the expected use and maintenance schedule of an item. Economic considerations are closely linked to maintenance and system lifecycle; as failure to consider design's effects on maintenance, and vice versa, can have adverse affects on profit. Therefore, design and maintenance are simultaneously planned in order to ensure an efficient and cost-effective operation over the life of a product.

Generally, there are three types of maintenances in use, viz., preventative (PM), corrective (CM), predictive maintenance (PdM). Maintenance can also be classified according to the degree the maintenance work is carried out to restore the equipment in relation to its original state. For example, a "Perfect Maintenance" is one which restores the equipment to as good as new condition. Minimal Maintenance results in equipment having the same failure rate as it had before the maintenance action was initiated. This is called - as bad as old state. Imperfect Maintenance is one in which the equipment is not restored to as good as new but relatively younger (a state in between as good as new and as bad as old). Worse Maintenance is one which results (unintentionally) in an increase of equipment's failure rate or actual age but does not result in break down. While the maintenance that results in equipment's breakdown is termed as "Worst Maintenance". Accordingly, the PM or CM would belong to one of the above categories.

PM is a schedule of planned maintenance actions aimed at preventing an equipment failure before it actually occurs and to keep it working and/or extend its life. It is performed on a regular basis. For example, lubrication of mechanical systems is done after a certain number of operating hours or the replacement of lightning arresters in jet engines is done after a certain number of lightning strikes. PM designed to enhance the equipment reliability by replacing worn components before they actually fail and this includes activities like equipment checks, partial or complete overhauls at specified periods, oil changes, lubrication and so on. An ideal preventive maintenance program is one which prevents all equipment failures before they occur.

Preventive maintenance is a logical choice if the following two conditions are satisfied:

The equipment has an increasing hazard rate, thereby implying a wear-out situation.

The overall cost of the preventive maintenance actions (which include ancillary tangible and/or intangible costs, such as downtime costs, loss of production costs, lawsuits over the failure of a safety-critical item, loss of goodwill, etc.) must be less than the overall cost of a corrective action.

If an item has an increasing failure rate, then a carefully designed PM program is likely to improve system availability. Otherwise, the costs of PM might actually outweigh the benefits. Also it must be made explicitly clear that if an item has a constant failure rate, then PM will have no effect on the item's failure occurrences. A good preventive maintenance program should either minimize the overall costs (or downtime, etc.) or meet the reliability/ availability goals. In order to achieve this, an optimum interval of time must be determined for the scheduled maintenance. Long-term benefits of preventive maintenance include, improved system reliability, decreased cost of replacement, decreased system downtime, better spares inventory management. Thus long-term effects and cost comparisons usually favor preventive maintenance over performing maintenance actions only when the system fails.

Predictive maintenance (PdM) or condition based maintenance (CBM) is carried out only after collecting and evaluating enough physical data on performance or condition of equipment such as temperature, vibration or particulate matter in oil etc. by performing periodic or continuous (on-line) equipment monitoring. Analysis is then performed on the collected data to prepare an appropriate maintenance plan. PdM technologies used to collect information of equipment condition can include infrared thermography, acoustic (partial discharge and airborne ultrasonic), corona detection, vibration analysis, sound level measurements, oil analysis, motor current analysis and other specific online tests. The basic aim in PdM is to perform maintenance at a scheduled point in time when the maintenance activity is most cost effective but before the equipment fails in service. Most PdM inspections are performed while equipment is in service, thereby minimizing disruption of normal system operations. This type of maintenance is generally carried out on mechanical systems where historical data is available for validating the performance and maintenance models for the systems and the failure modes are known.

Although there are sophisticated techniques for condition monitoring available these days, however, the main determinant of frequency of condition monitoring is the PF interval, which is the lead time at which an incipient failure can first be detected, until functional failure occurs. The PF Interval can only be approximately estimated even today. Any error tends to be on the conservative (i.e., too frequent) side. However there are cases of bearing failures that have occurred undetected, despite these bearings being monitored at these conservative frequencies.? However, smart sensor technology is likely to reduce the complexity of linking the outputs of these sensors to current process control systems thereby more and more equipment can be monitored continuously, on-line, and the control room operators will be able to assess quickly and easily, the current condition of the bearings or alignment or balance or gears on a particular machine. Several expert systems for fault diagnosis are available today. However, at present, these expert systems are still essentially rule-based systems, and like all rule-based systems, the results are only as good as the rules that have been established within the system.

Corrective maintenance (CM) consists of the actions taken to restore a failed equipment or system to operational state. This maintenance usually involves replacing or repairing the component that caused the failure of the overall system. CM can be performed only at unpredictable intervals because the item's failure time is not known a priori. An item becomes operational after CM or repairs have been performed. Corrective maintenance is actually carried out in three steps:

Diagnosis of the fault: It is the process of locating the fault or failed parts or otherwise satisfactorily assess the cause of the equipment or system failure.

Repair or replacement of faulty components: Once the cause of a failure has been established, action is taken to remove the cause, usually by replacing or repairing the failed components.

Verification of the repair action: After the faulty components have been repaired or replaced, the repair crew must verify that the system is again successfully operating.

The total time taken to repair the equipment is known as down time (DT), and the uptime of an equipment or system is the time during which it is available or operating.? In fact, DT is the sum of the administrative time, logistic time and the actual repair time. The administrative time is the time spent in organizing repairs. This excludes the logistic time, which is the portion of down time during which the repair activity is suspended or delayed on account of non-availability of spare parts or replacements.? The actual repair time or active repair time is the time during which the repairmen are working on the equipment to affect the repairs. This time in fact is the sum of the time to locate the fault or faults and for identification of the fault, fault correction time, and finally the time taken for testing and recommissioning the equipment. It is apparent that the repairability, which is the probability that the equipment or system will be restored to operable state within a specified active repair time, depends on the training and skill of the repair crew as well as on the design of the equipment. For example, the ease of accessibility of components in equipment has a direct effect on the active repair time. However, the human factors to a large extent govern the duration of active repair time.

Statistically speaking the uptimes and downtimes are random variables and will have their distributions. Based on these distributions, one can compute mean uptime (MUT) and mean downtime (MDT). Actually mean uptime reflects how good the inherent design or built-in reliability is and mean downtime reflects how good the maintainability is? There are other measures of performance of the maintained equipment such as point availability and interval availability.

Reliability Centered Maintenance (RCM) is an approach that helps in deciding what maintenance tasks must be performed at any given point of time. The value of RCM lies in the fact that it recognizes that the consequences of failures are far more important than their technical characteristics. In fact, it recognizes that the only reason for doing any kind of proactive maintenance is not to avoid failures per se, but to avoid or at least minimize the consequences of failures.

The total productive maintenance is a proactive equipment maintenance strategy designed to improve overall equipment effectiveness .It actually breaks the barrier between maintenance department and production department of a company. Total Productive Maintenance: It is an approach to optimize the effectiveness of production means in a structured manner.

Computerized Maintenance Management System (CMMS) also known as Enterprise Asset Management (EAM) is a stand alone computer program to manage maintenance work, labour and inventory in a company, whereas EAM not only does all the above functions what a CMMS does but also integrates with the company financial, human resource, material management and other ERP (Enterprise Resource Planning) applications. In the past, stand alone CMMS had an advantage over EAM in terms of features, ease of use and functionality.? A detailed description of maintenance models, strategies and analysis is given in [1].

1. Handbook of Performability Engineering, K. B. Misra (Ed.), 76 chapters, pp. 1315, Springer, 2008

Original articles

Select Methods of Uncertainty Analysis in Prognostics

PIERO BARALDI, IRINA CRENGUTA POPESCU, and ENRICO ZIO

2010, 6(4): 303-330. doi:10.23940/ijpe.10.4.p303.mag

Abstract

PDF (424KB)

The goal of prognosis on a structure, system or component (SSC) is to predict whether the SSC can perform its function up to the end of its life and in case it cannot, estimate the Time to Failure (TTF), i.e., the lifetime remaining between the present and the instance when it can no longer perform its function. Such prediction on the loss of functionality changes dynamically as time goes by and is typically based on measurements of parameters representative of the SSC state. Uncertainties from two different sources affect the prediction: randomness due to variability inherent in the SSC degradation behavior (aleatory uncertainty) and imprecision due to incomplete knowledge and information on the SSC failure mechanisms (epistemic uncertainty). Such uncertainties must be adequately represented and propagated in order for the prognostic results to have operational significance, e.g., in terms of maintenance and renovation decisions. This work addresses the problem of predicting the reliability and TTF of a SSC, as measurements of parameters representative of its state become available in time. The representation and propagation of the uncertainties associated to the prediction are done alternatively by a pure probabilistic method and an hybrid Monte Carlo and possibilistic method. A case study is considered, regarding a component which is randomly degrading in time according to a stochastic fatigue crack growth model of literature; the maximum level of degradation beyond which failure occurs is affected by epistemic uncertainty.
Received on February 11, 2009 and revised on April 11, 2010
References: 29

Select On Augmented OBDD and Performability for Sensor Networks

JOHANNES U. HERRMANN, SIETENG SOH, SURESH RAI, and GEOFF WEST

2010, 6(4): 331-342. doi:10.23940/ijpe.10.4.p331.mag

Abstract

PDF (185KB)

A method to model component reliability is presented, where the life is measured by the number of load applications. With static strength failure and fatigue failure as the scenario, the dynamic models for calculating the reliability and failure rate of components under random repeated loads both without strength degradation and those with strength degradation are derived. The relationship between reliability and the number of load applications, and that between failure rate and the number of load applications are discussed in different cases. The result shows that both the reliability and the failure rate of components decrease as the number of load applications increases. This is even though the strength does not degenerate, and the failure rate curve has the partial characteristics of a bathtub curve, with an early failure period and a random failure period. When strength degenerates with the number of load applications, the reliability of components decreases more obviously, and the failure rate curve of components is bathtub-shaped.
Received on July 10, 2009, revised on September 2, 2009
References: 15

Select Two Sequential Attacks of a Parallel System when Defense and Attack Resources are Expendable

KJELL HAUSKEN GREGORY LEVITIN

2010, 6(4): 343-354. doi:10.23940/ijpe.10.4.p343.mag

Abstract

PDF (268KB)

The paper compares the efficiency of single and double attack against a system consisting of identical parallel elements (1-out-of-N system). An attacker tries to maximize the system vulnerability (probability of total destruction), and the defender tries to minimize it. The attacker and the defender distribute their constrained resources optimally across two attacks. The attacker attacks all elements in the first attack, and all surviving elements in the second attack. The defender protects all elements before the first attack, and protects all surviving elements before the second attack. Both agents decide how to distribute their resources between the two attacks before the first attack. Both agents' resources are expendable and last only one attack. Both agents observe which elements are destroyed and not destroyed in the first attack, and apply their remaining resources into attacking and protecting the remaining elements in the second attack. First the optimal attack and defense strategy against a system with a fixed number of elements is analyzed. Thereafter a minmax two period game between the attacker and the defender is considered in which the defender distributes its constrained resource between the two attacks as well as between deploying redundant elements and protecting them against the attacks.
Received on March 18, 2009, revised on April 7, 2010
References: 17

Select Condition-based Vehicle Fleet Retirement Decision: A Case Study

R. JIANG GUANGCAI SHI

2010, 6(4): 355-362. doi:10.23940/ijpe.10.4.p355.mag

Abstract

PDF (184KB)

The vehicle replacement decision is a multi-objective optimization problem subject to the acquisition budget and government regulation constraints. This necessitates a condition-based replacement policy. In this paper, we carry out a case study of the vehicle fleet replacement problem. Based on the field data in a vehicle fleet management system, we derive a set of condition parameters for representing the health level of a vehicle. A correlation analysis is carried out to identify key condition parameters. The identified parameters are then combined into a health index. The index is useful for condition-based vehicle replacement decision.
Received on August 18, 2009, revised April 10, 2010
References: 09

Select Change Processes towards Flexible Lean Manufacturing: A Framework

GULSHAN CHAUHAN, T. P. SINGH, and S. K. SHARMA

2010, 6(4): 363-372. doi:10.23940/ijpe.10.4.p363.mag

Abstract

PDF (143KB)

To be successful in today's increasingly time-sensitive and competitive markets, businesses need manufacturing processes that are fast, flexible, and adapt quickly to change. By means of a survey conducted in the Indian manufacturing industries discusses the most important parameters of lean manufacturing and flexibility and the significant benefits that were accrued in flexible and lean manufacturing operations. The flexibility is recognized as an important feature in manufacturing. However, the extent of such technology applications varies from industry to industry and has met various degrees of success. The manufacturing companies that are changing processes towards hybrid flexible lean manufacturing (FLM), and focusing on company strategy for cost reduction through eliminating wastages are going to stay in this competitive world. This paper describes the development and comparison of characteristics framework by means of which organizations can effectively integrate and change processes towards FLM. The data for this survey were collected through interviews, questionnaire and archival sources.
Received on October 22, 2008, revised September 09, 2009
References: 24

Select An Empirical Expression for Reliability Index of Flanged RC Beams in Limit State of Deflection

RAVINDRA P. PATIL K. MANJUNATH

2010, 6(4): 373-380. doi:10.23940/ijpe.10.4.p373.mag

Abstract

PDF (143KB)

Reliability indices of reinforced concrete flange beams with respect to limit state of deflection designed as per the provisions of IS 456:2000 are found to be nonuniform. Through this paper, an attempt has been made to propose an empirical expression for reliability index of a flanged RC beam in limit state of deflection. This equation will be useful in deciding depth of member for a given span, load with target reliability index in limit state of deflection and also to obtain an estimate of reliability index of the designed RC beam.
Received on August 9, 2009, revised April 7, 2010
References: 17

Select Dependability Analysis in a Two node Cluster System

C.CHELLAPPAN G.VIJAYALAKSHMI

2010, 6(4): 381-388. doi:10.23940/ijpe.10.4.p381.mag

Abstract

PDF (122KB)

In this paper, we present a Markov reward model for the two node active/standby cluster system with multistage assuming minor failures, software rejuvenation and minor repairs. The transient analysis is provided. Some important dependability measures such as availability, reliability and mean time to failure are obtained.
Received on December 12, 2008, revised on August 3, 2009
References: 11

Select Multi-objective Offshore Safety System Design Optimization

L. M. BARTLETT J. RIAUKE

2010, 6(4): 389-399. doi:10.23940/ijpe.10.4.p389.mag

Abstract

PDF (169KB)

The objective of this paper is to present a multi-objective approach to the design optimization process applied to systems that require a high likelihood of functioning on demand. In the real world it is common that there are several objectives to be met, not just maximising the system availability, and hence an approach is required to deal with these issues. A method is presented that integrates the latest advantages of the fault tree analysis technique and the binary decision diagram method to model the availability issue, along with a multi-objective optimization approach (the Improved Strength Pareto Evolutionary Approach) to cater for meeting the multiple criteria of assessment. The end product is a mechanism to yield the best design option. The paper presents the principles of the method and a case study to illustrate how the method is applied, along with the results produced. The case study relates to a high integrity protection system of an offshore platform. The optimization criteria involves unavailability, cost, spurious trip frequency and maintenance down time. Several enhancements to the optimization strategy to improve the efficiency of the approach are discussed.
Received on July 09, 2009 and revised on January 25, 2010
References: 12

Online ISSN 2993-8341
Print ISSN 0973-1318