The following example illustrates the modeling of complex failure and repair interactions among the various components of the system. The system consists of three types of components: two processors, a front end, and a database. The failure mode is used to model the fact that a processor can fail in two different modes: mode A with probability PROB and mode B with probability 1-PROB. The repair rates are different in each of the modes. Failures of a processor may affect the system in different ways. In the failure mode A no component is affected. In the failure mode B the database is affected. Using the usual concept of coverage, with probability COVERAGE the database is successfully recovered and no manual repair is needed. The recovery is assumed to be instantaneous (instantaneous coverage). With probability 1-COVERAGE the database is corrupted and a repair is necessary. The database may also fail spontaneously. In order to repair the database at least one processor must be operational.
The system is considered operational when at least one of each type of component is operational. No component can fail once the system is down. Finally, there is a single repair center and the highest repair priority is assigned to the front end followed by the database and then by the processors.
To model this database system, we use four objects:
The single event is Fail. When this event triggers, the database sends a message to the repair center, through the FAILURE_REPAIR port (named failure_repair_port). This port is used by all objects to send a message when they fail.
The message received through the affected port (named affected_port) is sent by the processor object (in this case the processor failure mode is B). The message received through the FAILURE_REPAIR port (named failure_repair_port) is sent by the repair center to indicate that an object was repaired. The message received through the status port (named sys_status_port) is sent by the repair center when the system is down (in this case, the event FAIL becomes disabled) and when the system is up (in this case, the event FAIL becomes enabled).
The single event is Fail, that represents a failure of the processor object. With probability PROB, the failure is of type A, and with probability 1-PROB it is of type B.
The messages received from the repair center are similar for all objects. The only difference is that in the case of the processor object, the message data field is checked to verify the type of failure that was repaired.
The only state variable is a , that stores in each position the status of each component. The first position is reserved to the object with the highest repair priority (in this case, the front end object). The second position is reserved to the object with has second priority level (the database object) and the third and fourth positions, are reserved to the processors (each position corresponding to a different failure mode).
The events in the repair center object correspond to a repair performed in a system component. When the repair center repairs an object, it sends a message to the respective object (FRONT_END, DATABASE, PROCESSOR) through the failure_repair_port. The message that represents the system status is sent if and only if the specified condition is true (i.e. when the system is operational or not).
In the Messages attribute, the repair object receives th e messages sent by the other objects in the database model. When the message arrives, the repair object checks who sends the message (through the function objcmp(msg_source,object)). This is a boolean function that compares two objects. The word msg_source checks the object that sends the message. Then the , which indicates the failed objects, can be updated. If the database or the front end fails, a message is sent to all objects and the system becomes down. If the number of processors failed is equal to 2, the same message is sent to all objects in the model.
The Processor object and the Repair object are shown in Figures , and .
Guilherme Dutra Gonzaga Jaime 2010-10-27