Samoa Digital Library

Proactive management of software aging

Show simple item record

dc.contributor.author Castelli, V.
dc.contributor.author Harper, R. E.
dc.contributor.author Heidelberger, P. ...et.al.
dc.date.accessioned 2020-12-02T02:11:35Z
dc.date.available 2020-12-02T02:11:35Z
dc.date.issued 2001
dc.identifier.uri ${sadil.baseUrl}/handle/123456789/154
dc.description data, tables, diagrams ; 22 p. (includes bibliographical references) en_US
dc.description.abstract Software failures are now known to be a dominant source of system outages. Several studies and much anecdotal evidence point to “software aging” as a common phenomenon, in which the state of a software system degrades with time. Exhaustion of system resources, data corruption, and numerical error accumulation are the primary symptoms of this degradation, which may eventually lead to performance degradation of the software, crash/hang failure, or other undesirable effects. “Software rejuvenation” is a proactive technique intended to reduce the probability of future unplanned outages due to aging. The basic idea is to pause or halt the running software, refresh its internal state, and resume or restart it. Software rejuvenation can be performed by relying on a variety of indicators of aging, or on the time elapsed since the last rejuvenation. In response to the strong desire of customers to be provided with advance notice of unplanned outages, our group has developed techniques that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reaches a critical level, and automatically perform proactive software rejuvenation of an application, process group, or entire operating system, depending on the pervasiveness of the resource exhaustion and our ability to pinpoint the source. This technology has been incorporated into the IBM Director for xSeries servers. To quantitatively evaluate the impact of different rejuvenation policies on the availability of cluster systems, we have developed analytical models based on stochastic reward nets (SRNs). For timebased rejuvenation policies, we determined the optimal rejuvenation interval based on system availability and cost. We also analyzed a rejuvenation policy based on prediction, and showed that it can further increase system availability and reduce downtime cost. These models are very general and can capture a multitude of cluster system characteristics, failure behavior, and performability measures, which we are just beginning to explore. en_US
dc.language.iso en en_US
dc.publisher International Business Machines Corporation en_US
dc.relation.ispartofseries IBM J. RES. & DEV. VOL. 45 NO. 2 MARCH 2001;
dc.title Proactive management of software aging en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Saili Sadil


Vaavaai

O a'u faʻamatalaga