next up previous contents
Next: Synchronous Verilog Up: Conditional proof statements, and Previous: Conditional proof statements, and   Contents

The bakery algorithm

We will now apply the above techniques to the ``bakery'' mutual exclusion algorithm of Lamport (see ``A New Solution of Dijkstra's Concurrent Programming Problem'' by Leslie Lamport [Communications of the ACM 17, 8, August 1974, pp. 453-455]). The purpose of this algorithm is to provide mutually exclusive access to some common resource by a collection of N processes. The proof presented here is derived from a proof due to Shaz Qadeer and Jim Saxe.

Each process i executes two phases before entering its ``critical section'', in which it can access the resource. First, it chooses a ``ticket number'' greater than zero. To do this, it first sets a bit called choosing[i], to tell other processes it is in the choosing phase. Then it loops through all process indices and finds the maximum ticket number held by any process (of course, the ticket numbers can be changing while this is going on!). This done, it picks a number[i] for itself that is greater the the maximum value it saw, and clears its choosing bit. Process i's ``ticket'' now consists of the pair (number[i],i). Tickets are compared in lexicographic order. Thus, a ticket (n1,i1) is less than (n2,i2) when either n1 < n2 or n1 = n2 and i1 < i2. The idea is that processes with lower tickets get to enter their critical section first.

Now process i enters the waiting phase. It loops through all process indices, and waits for any process that is either in its choosing phase, or holds a lower ticket. After waiting in turn for all the other processes, process i can enter its critical section.

The algorithm can be described by the following ``pseudo-code'':

  L1:  {noncritical section; nondeterministically goto L1 or L2;}
  L2:  {choosing[i] := 1; max[i] := 0;}
  L3:  count[i] := 1;
  L4:    max[i] := maximum(max[i], number[count[i]]);
  L5:    if count[i] < NPROCS then {count[i] := count[i]+1; goto L4;}
  L6:  number[i] := max[i]+1;
  L7:  choosing[i] := 0;
  L8:  count[i] := 1;
  L9:    if choosing[count[i]] = 1 then goto L9;
  L10:   if number[count[i]] \neq 0) and
            (number[count[i]] < number[i] or
             (number[count[i]] = number[i] & count[i] < i)) then goto L10;
  L11:   if count[i] < NPROCS then {count[i] := count[i]+1; goto L9;
  L12: critical section;
  L13: {number[i] := 0; goto L1;}
Note, the loop in L3-L5 and the assignment at L6 together correspond to the statement:
       number[i] := 1 + maximum(number[1], ..., number[N]);
in Lamport's version, but break the action into steps to make it clear that process i does not simultaneously read number[j] and write number[i].

Also, Lamport writes:

... The algorithm has the remarkable property that if a read and a write operation to a single memory location occur simultaneously, then only the write must be performed correctly. The read may return any arbitrary value.
and
A processor may fail at any time. We assume that when it fails, it immediately goes to its noncritical section and halts. There may be a period when reading from its memory gives arbitrary values. Eventually any read from its memory must give a value of zero.
The pseudo-code above does not model these two aspects of the algorithm (non-atomic memory accesses and crashing). However, both aspects are modeled in the following SMV program:
#define NPROCS 10000
ordset INDEX 1..NPROCS;
ordset NAT 0..;
typedef PC {DOWN, L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13};

module main() {

/* The implementation of the bakery algorithm starts here. */

  pc: array INDEX of PC;
  count: array INDEX of INDEX;
  choosing: array INDEX of boolean;
  number, max: array INDEX of NAT;

  act: INDEX;     /* the "active process" */
  crash: boolean; /* The active process crashes if crash = 1. */

  /* These variables take on arbitrary values of their types at each
     time step: */

  any_nat: NAT;
  any_boolean: boolean;
  any_index: INDEX;

  /* The following variable are used to model the nonatomicity of
     accesses to the shared variables number[...] and choosing[...]: */

  writing_choosing, writing_number: array INDEX of boolean;
  v_choosing: array INDEX of boolean;
  v_number, max: array INDEX of NAT;

  forall (i in INDEX) {

    /* Process i is considered to be writing to be writing one of its
       shared variables (choosing[i] or number[i]) during the entire
       time that pc[i] is at any line contains an assignment to
       that variable, not just when act = i (which happens on the
       final time step of each write by process i). */

    writing_choosing[i] := pc[i] = DOWN |pc[i] = L2 | pc[i] = L7;
    writing_number[i] := pc[i] = DOWN | pc[i] = L6 | pc[i] = L13;

    /* A process reading the shared variable choosing[i] (or number[i])
       sees the value v_choosing[i] (resp., v_number[i]), which is
       arbitrary if the shared variable is currently being written, and
       is the actual value of the variable otherwise. */

    v_choosing[i] := writing_choosing[i] ? any_boolean : choosing[i];
    v_number[i] := writing_number[i] ? any_nat : number[i];
  }

  /* The initial conditions for each process: */
  
  forall (i in INDEX) {
    init(number[i]) := 0;
    init(choosing[i]) := 0;
    init(pc[i]) := L1;
  }
  
  /* Some useful temporary variables */
  
  red: NAT;             /* the number read by the active process */
  countmax : boolean;   /* have we reached the maximum count? */
  
  /* The action taken by the active process on a step: */

  if (crash) {
    next(max[act]) := any_nat;
    next(count[act]) := any_index;
    next(pc[act]) := DOWN;
  } else {
    switch (pc[act]) {
      DOWN: { next(choosing[act]) := 0;
              next(number[act]) := 0;
              next(pc[act]) := L1;
            }
      L1: { /* noncritical section */
            next(pc[act]) := {L1, L2}; 
            /* a process can stay in the noncritical section indefinitely */
          }
      L2: { next(choosing[act]) := 1; 
            next(pc[act]) := L3; 
          }
      L3: { next(count[act]) := 1;
            next(max[act]) := 0; 
            next(pc[act]) := L4; 
          }
      L4: { red := v_number[count[act]];
            next(max[act]) := max[act] < red ? red : max[act];
            next(pc[act]) := L5;
          }
      L5: { countmax := count[act] = NPROCS;
            if (countmax) 
              next(pc[act]) := L6;
            else {
              next(count[act]) := count[act] + 1;
              next(pc[act]) := L4;
            } 
          }
      L6: { next(number[act]) := max[act] + 1; 
            next(pc[act]) := L7;
          }
      L7: { next(choosing[act]) := 0; 
            next(pc[act]) := L8; 
          }
      L8: { next(count[act]) := 1;
            next(pc[act]) := L9; 
          }
      L9: { next(pc[act]) := v_choosing[count[act]] ? L9 : L10; }
      L10: { red := v_number[count[act]];
              next(pc[act]) :=
               red~= 0 & 
               (red < number[act] | 
                (red = number[act] &
                 count[act] < act ))
             ? L10 
             : L11; 
           }
      L11: { 
             countmax := count[act] = NPROCS;
             if (countmax) 
               next(pc[act]) := L12;
             else {
               next(count[act]) := count[act] + 1;
               next(pc[act]) := L9;
             } 
           }
      L12: /* critical section */
           { next(pc[act]) := L13; }
      L13: { next(number[act]) := 0; 
             next(pc[act]) := L1; 
           }
    }
  }
We would first like to show that our algorithm is safe, in that no two processes can be in the critical section (statement L12) at the same time. This is specified as follows:
  forall (i in INDEX) forall (j in INDEX)
    safe[i][j]: assert G(i ~= j ->  ~(pc[i] = L12 & pc[j] = L12));
The proof of this does not require induction. We simple split cases on the ticket number of one of the two processes (say number[i]). this is necessary so that when we compare ticket numbers at line L10, we will get a concrete truth value and not bottom. Note, however, that we don't have to split cases on both values to get this effect. This gives us the following proof:
  forall (i in INDEX) forall (j in INDEX) forall (n in NAT) {
    subcase safe[i][j][n] of safe[i][j]
      for number[i] = n;
    using enum(red), NAT -> {0, n} prove safe[i][j][n];
  }
Note, we also tell SMV two addition things about the abstraction to use. The first is to enumerate the values of the variable red, the number being read by the active process. If this variable got the value bottom it would cause the computation of max to be corrupted (in fact, you might want to try running the proof without the enum(red) to see what happens). In addition, we have to tell SMV to include the constant 0 in the abstraction of the type NAT. This is because the value zero is special, representing the highest possible ticket number (you might also try removing NAT -> {0, n} to see what happens). SMV can now show, using just two processes i and j in its abstraction that both cannot be in the critical section at the same time.

However, let us continue on to the liveness proof, since this is the pert that involves induction. To show liveness, we first have to make a few assumptions about fairness. First, we assume that every process executes a step infinitely often, that is, infinitely often act = i, for all i. In particular, this means that no process can stay at line L12, the critical section, forever. Further, we have to assume that only a finite number of crashes occur. These assumptions are expressed as follows:

  forall (i in INDEX) {
    fair[i]: assert G F(act = i);
    assume fair[i];
  }

  finite_crashes: assert F G ~crash;
  assume finite_crashes;

Now we can state our liveness property. That is, we would like every process that gets to line L2 to eventually reach the critical section at L12 (or possibly to crash, since this possibility can't be avoided):

  forall (i in INDEX)
    live[i]: assert G(pc[i] = L2 -> F(pc[i] = L12 | pc[i] = DOWN));

To prove this, we first split the problem into two lemmas, stating that each of the two loops eventually terminates. The most convenient way to say this is to say that if the process enters the first statement of the loop, it eventually reaches the next statement after the loop. This gives us the following two lemmas, which we use to prove live[i]:

  forall (i in INDEX) {
    reaches_L6[i]: assert G(pc[i] = L4 -> F(pc[i] = L6 | pc[i] = DOWN));
    reaches_L12[i]: assert G(pc[i] = L9 -> F(pc[i] = L12 | pc[i] = DOWN));

    using reaches_L6[i], reaches_L12[i], fair[i],
          max//free, v_choosing//free, number//free, v_number//free
      prove live[i];
}
Note we free a few irrelevant variables to make the model checker run faster, and also made use of our fairness assumption to guarantee progress outside the loops.

Now we attack the first lemma, which says that the loop computing the maximum of the numbers eventually terminates. This is done by induction over the loop variable. We split cases on the current value and assume by induction that the loop terminates for all higher values (using course-of-values induction). Thus we have:

  forall (i in INDEX) forall (j in INDEX) {
    subcase reaches_L6_j[i][j] of reaches_L6[i] for count[i] = j;
    forall(k in INDEX){
      using reaches_L6_j[i][k] if k > j,
            fair[i],
            count//free, count[i], 
            number//free,
            pc//free, pc[i],
            red//free,
            v_choosing//free
            prove reaches_L6_j[i][j];
    }
  }
Note, we also freed some irrelevant variables here to speed up the model checking.

Finally to show termination of the second loop that waits for all processes with lower tickets, we use the same technique that we used in the simple example of the previous section - we assume termination for all processes with lower ticket values. The definition of a lower ticket value is a bit more complicated here, since we are using a lexicographic order over number and process indices. However this does not complicate the proof. We case split on n, the current ticket number, and on j the index of the process we are currently waiting for. If the ticket (number[j],j) is less than the ticket (n,i), we get to assume j eventually terminates. Of course, we also get to assume that the loop always terminates for count values greater than j. This is how we do it:

  forall(i in INDEX){
    forall(n in NAT) forall(j in INDEX){
      subcase reaches_L12_n[i][n] of reaches_L12[i]
        for number[i] = n;
      subcase reaches_L12_nj[i][n][j] of reaches_L12_n[i][n] for count[i] = j;
    }
    forall(n in NAT) forall(j in INDEX) forall(m in NAT) forall(k in INDEX){
      using
        reaches_L12_nj[i][n][k] if k > j,
        reaches_L12_n[j][m] if m < n | (m = n & j < i),
        fair[i], fair[j], reaches_L6[j],
        finite_crashes,
        NAT -> {0, n},
        max//free, max[j],
        v_choosing//free, v_choosing[j],
          enum(red)
      prove reaches_L12_nj[i][n][j];
    }
  }
}
That is, we use reaches_L12_nj[i][n][k] if k > j, meaning all iterations k > j terminate, and also reaches_L12_n[j][m] if m < n | (m = n & j < i), meaning that process j terminates for all numbers m = number[j] such that ticket (m,j) is less that (n,i). The reason this works is as follows. Suppose we are waiting for process j. If j is holding a higher ticket, we continue, incrementing count, which means we eventually terminate. On the other hand, if j has a lower ticket, it must eventually terminate its loop. The fairness constraint fair[j] guarantees that it must then exit its critical section and go back to the beginning. However, there is still the possibility that before we can continue, process j will reenter its choosing loop and get a new ticket. Fortunately, if it does this, it must read our number, and hence its new ticket will be higher than ours, hence we will be able to continue. This assumes, of course, that it does not stay forever in its choosing loop (which would block us) which is why we use the property reaches_L6[j]. We also need the finite crashing property to insure that process j won't keep crashing and reentering its choosing loop forever, and thus block us forever.

Fortunately, we don't have to do all that reasoning ``by hand'' - the model checker does it for us. We did have to give the model checker a few clues, however. In particular, we once again had to make sure the values of red were enumerated, and the zero was included in the abstraction of type NAT.

That concludes the proof of our model of the bakery algorithm. The entire file can be found here.


next up previous contents
Next: Synchronous Verilog Up: Conditional proof statements, and Previous: Conditional proof statements, and   Contents
2003-01-07