The special pseudo-event REWARD_REACHED

The special pseudo-event REWARD_REACHED was created in order to monitor values of cumulative rewards. Through the use of this event, actions can be taken depending on the level of a c.r.

The event triggers when a certain c.r. reaches a given value. The condition is

  get_cr(reward) symbol limit
where get_cr(reward) represents the accumulated value of c.r. reward, symbol is one of ``\/'' or ``/\'', and limit represents the value to be reached. The symbol ``\/'' indicates that the trigger should occur when the c.r. crosses the limit from above, and ``/\'' that it should occur when the c.r. crosses the limit from below; see Fig. [*].

Figure: Triggering on a c.r.
\includegraphics[width=3in]{figuras/REW_R_delta_t.eps}

The occurrence time of the next trigger is calculated by the following expression:

\begin{displaymath}
\Delta t = \left\{
\begin{array}{ll}
\displaystyle
\fr...
...isplaystyle
\infty, & \mbox{otherwise}.
\end{array} \right.
\end{displaymath} (2.1)

where $CR$ and $IR$ represent, respectively, the accumulated reward and the instantaneous reward value of the reward specified in the condition, and $L$ is the limit.

Multiple expressions like this can be used in the condition of the event, and the trigger time will be set to the minimum of all the values.

Beyond these special expressions, other expressions can be used to define the condition, like comparisons between state variables and constants. All expressions are evaluated, and if the ordinary ones return TRUE, the trigger times are calculated (the smaller will determine the event trigger), otherwise the event is disabled. NOTE: if the condition is TRUE, but no rewards will cross their limits, the event will be scheduled for an infinite time, that is, it will not be scheduled at all.

  event = t0 (REWARD_REACHED)
  condition = 
    ((get_cr(fluid1) \/ 0) || (get_cr(fluid2) \/ 0))
  action = { ... }

  event = FullBuffer (REWARD_REACHED)
  condition = ( (Status==1) && (get_cr(buffer) /\ B) )
  action = { ... }
WARNING: In order to use the REWARD_REACHED event for some reward, this reward must accumulate only using the set_ir command. For rewards that accumulate as
  rate_reward=buffer
  bounds=0,B
  condition=(SourceStage==1)
  value=lambda-C;
  condition=(SourceStage==0)
  value=-C;
the REWARD_REACHED event cannot be used. The declaration of a reward to use with a REWARD_REACHED event must be as follows:
  rate_reward = buffer
  bounds = 0,B
  condition = (FALSE)
  value = 0;
And in the event declaration
  event = Activate_Source(EXP,alpha)
  condition = (SourceStage==0)
  action = {
    float accum;
    ...
    accum = lambda-C;
    set_ir(buffer,accum);
    ...
  };

  event = BufferFull(REWARD_REACHED)
  condition = (get_cr(buffer) /\ B)
  action = {...};
Another limitation is when there are two simultaneous events at least one of which is a REWARD_REACHED event, and the same reward is updated on both events. The REWARD_REACHED event occurs after all other events, so the value that will be assigned to the reward will be the one given by the REWARD_REACHED event. The possibility of selecting which reward value will be assigned (mean, maximum, minimum) is not implemented yet.

Also, a reward should be used as a condition only in one REWARD_REACHED event.

Guilherme Dutra Gonzaga Jaime 2010-10-27