Adding a reorder buffer

Next: Proving liveness Up: An out-of-order instruction processor Previous: Proving non-interference Contents

Adding a reorder buffer

Now, let's modify the design to use a ``reorder buffer''. This means that instead of writing results to the register file when they are produced by an execution unit, we store them in a buffer, and write them back to the register file in program order. This is usually done so that the processor can be returned to a consistent state after an ``exceptional'' condition occurs, such as an arithmetic overflow. The simplest way to do this in the present implementation is to store the result in an extra field res of the reservation station, and then modify the allocation algorithm so that reservation stations are allocated and freed in round-robin order. The result of an instruction is written to the register file when its reservation station is freed.

To effect this change, add the following fields to the reservation station structure st:

      completed : boolean;
      res : WORD;

Also add a variable complete_st to indicate which reservation station should be deallocated:

   complete_st : TAG;

Now, change the instruction completion logic, so that, when a result appears on the the bus pout, instead of storing it in the register file, we store it in the res field of the reservation station and set the completed bit. If the reservation station indicated by complete_st has its completed bit set, we store its result from the res field into the register file. Thus, we replace the instruction completion logic with the following:

  default {
    /* result writeback logic */

    if(st[complete_st].valid & st[complete_st].completed){
      forall(i in REG)
        if(ir[i].resvd & ir[i].tag = complete_st){
          next(ir[i].resvd) := 0;
          next(ir[i].val) := st[complete_st].res;
        }
      next(st[complete_st].valid) := 0;
    }
  } in default {
    /* instruction completion logic */

    if(pout.valid){
      forall(i in TAG){
        if(~st[i].opra.valid & st[i].opra.tag = pout.tag){
          next(st[i].opra.valid) := 1;
          next(st[i].opra.val) := pout.val;
        }
        if(~st[i].oprb.valid & st[i].oprb.tag = pout.tag){
          next(st[i].oprb.valid) := 1;
          next(st[i].oprb.val) := pout.val;
        }
        if(st[i].issued && pout.tag = i){
          next(st[i].completed) := 1;
          next(st[i].res) := pout.val;
        }
      }
    }

  } in ...

Finally, we have to make sure that a result sitting in the res field of a completed instruction, but not yet written back to the register file, gets forwarded to any new instructions that might need it. Thus, for example, we change the operand fetch logic for the opra operand to the following:

          /* fetch the a operand (with bypass) */
          
          if(pout.valid & ir[srca].resvd & pout.tag = ir[srca].tag){
            next(st[st_choice].opra.valid) := 1;
            next(st[st_choice].opra.tag) := ir[srca].tag;
            next(st[st_choice].opra.val) := pout.val;
          } else if(ir[srca].resvd & st[ir[srca].tag].completed){
            next(st[st_choice].opra.valid) := 1;
            next(st[st_choice].opra.tag) := ir[srca].tag;
            next(st[st_choice].opra.val) := st[ir[srca].tag].res;
          } else {
            next(st[st_choice].opra.valid) := ~ir[srca].resvd;
            next(st[st_choice].opra.tag) := ir[srca].tag;
            next(st[st_choice].opra.val) := ir[srca].val;
          }

Here, we have inserted a clause so that, if register srca is holding a tag, pointing to a completed reservation station, then we forward the res field of that reservation station as opra operand. Change the oprb logic correspondingly.

Finally, we introduce logic for choosing the reservation station to allocate (st_choice) and free (complete_st), so that reservation stations are used in round-robin order:

#define NUM_RS 32
  breaking(TAG){
    init(st_choice) := 0;
    init(complete_st) := 0;
    
    if(~stallout & opin = ALU)
      next(st_choice) := st_choice + 1 mod NUM_RS;
    if(st[complete_st].valid & st[complete_st].completed)
      next(complete_st) := complete_st + 1 mod NUM_RS;

Note, we chose here, arbitrarily, to use reservations stations numbered from 0 to 31 in the round-robin. Also note that since this logic breaks the symmetry of the type TAG, we have to put it in a breaking clause. If a new instruction is stored in a reservation station, we increment st_choice modulo 32. Similarly, if a reservation station is freed (i.e., the station chosen to be freed is marked completed), then we increment complete_st module 32. This is done so that results of instructions are written to the register file in the same order that the instructions are received.

Now, open the new version and choose ``Prop|Verify all''. You should find that all of the properties are still true. That is, after this design change, the processor can be verified without modifying one line of the proof! This is because our three lemmas (for operands, results and noninterference) are not affected by the design change. Now, select an instance of lemma1, and look in the cone pane. You will notice that the signals st_choice and complete_st are free. This is because the assignments to these signals break the symmetry of type TAG, and thus cannot be used to verify this property, as we are using a symmetry reduction on type TAG. Thus, we have in fact verified the correctness of our design's data output independent of the definition of these signals, and have not used in any way the fact that these signals obey a round-robin policy. This should not be too surprising, as in the previous version of the design, no particular ordering was used. If we were to introduce some form of ``exception'', however, that interrupts the instruction stream, we would presumably need to use the round-robin policy to show that a consistent state is obtained after an exception.

Nonetheless, the fact that our proof was unaffected by the design change illustrates an important general point about compositional proofs. That is, our proof has the virtue that it only specifies the values of three key signals: the source operands in the reservation stations (lemma1), the value on the result bus (lemma2) and the tag on the result bus (lemma3). Since the function of these signals was not changed in adding the reorder buffer, our proof remained valid. In general, when designing a proof decomposition, it is best to do it in such a way that as few signals as possible are referenced. In this way, the proof will be less likely to be invalidated by localized design changes.

Next: Proving liveness Up: An out-of-order instruction processor Previous: Proving non-interference Contents

2003-01-07