Funciton calculating sum of modified array

Aug 16, 2016 at 7:21 PM
I am trying to implement a greedy algorithm for load balancing. The method Stacking is given an array N of integers and tries to partition its elements into arrays A and B to minimize the maximum sum of the two arrays. To do this, it loops through array N and adds each element to either array A or B, depending on which has the smallest sum so far. It obviously does not always give the optimal solution.

The problem I am having is to verify that the sum of A and the sum of B are being computed correctly. I use variable M1 for the sum of A and M2 for the sum of B. A function Sum will calculate the sum of an array. The Sum function works correctly, because I can verify that M1+M2 is the sum of the elements of N.

However, it cannot verify that M1 is the sum of the elements of A and M2 is the sum of the elements of B. The problem is that those arrays are being modified. When I add a new element to array A, for example, all the elements before that element in array A are unchanged, so the previous sum that I calculated has not changed. Therefore it is easy to see that the new sum is also correct. But I think Dafny cannot do this, because Dafny knows I modified the array, and Dafny does not realize that the previous sum was not changed.

The code is below. I include the invariant M1 + M2 == Sum(N,nindex), just to show that the Sum function works correctly. But the invariants M1 == Sum(A,aindex) and M2 == Sum(B,bindex) are not being computed correctly.

Any ideas on how to make this work?
function Sum(A: array<int>, size: int): int
requires A != null
reads A
requires 0 <= size <= A.Length
{
  if size == 0 then 0
  else A[size-1] + Sum(A,size-1)
}

method Stacking(N: array<int>) returns(A: array<int>, B: array<int>, aindex: int, bindex: int, M1: int, M2: int) 
    
  requires N != null
      
   ensures A != null
   ensures B != null
   ensures 0 <= aindex <= A.Length
   ensures 0 <= bindex <= B.Length
   ensures M1 == Sum(A,aindex)
   ensures M2 == Sum(B,bindex)
   ensures M1 + M2 == Sum(N,N.Length)
      
{
   A := new int[N.Length];
   B := new int[N.Length];
   bindex := 0;
   aindex := 0;
   M1 := 0;
   M2 := 0;
   var nindex := 0;
   
   while(nindex < N.Length)
   invariant 0 <= nindex <= N.Length
   invariant 0 <= aindex <= nindex
   invariant 0 <= bindex <= nindex
   invariant M1 == Sum(A,aindex)
   invariant M2 == Sum(B,bindex)
   invariant M1 + M2 == Sum(N,nindex)
   
   {
     if(M1 <= M2)
     {
       M1 := M1 + N[nindex];
       A[aindex] := N[nindex];
       aindex := aindex + 1;
     } 
     else 
     {
       M2 := M2 + N[nindex];
       B[bindex] := N[nindex];
       bindex := bindex + 1;
     }
     nindex := nindex + 1;
   }    
}
Thanks,
Chris
Aug 17, 2016 at 8:44 AM
You can use ghost variables. Ghost variables do not appear in the compiler output, but are just used by the verifier.

http://rise4fun.com/Dafny/naKS
function Sum(A: seq<int>): int
{
  if A == [] then 0
  else A[|A|-1] + Sum(A[..|A|-1])
}

method Stacking(N: array<int>) returns(A: array<int>, B: array<int>, aindex: int, bindex: int, M1: int, M2: int) 
   requires N != null
   ensures A != null
   ensures B != null
   ensures 0 <= aindex <= A.Length
   ensures 0 <= bindex <= B.Length
   ensures M1 == Sum(A[..aindex])
   ensures M2 == Sum(B[..bindex])
   ensures M1 + M2 == Sum(N[..])  
{
   A := new int[N.Length];
   B := new int[N.Length];
   bindex := 0;
   aindex := 0;
   M1 := 0;
   M2 := 0;
   var nindex := 0;
   
   ghost var A' := [];
   ghost var B' := [];
   ghost var N' := [];
   
   while(nindex < N.Length)
     invariant 0 <= nindex <= N.Length
     invariant 0 <= aindex <= nindex
     invariant 0 <= bindex <= nindex
     invariant A' == A[..aindex]
     invariant B' == B[..bindex]
     invariant N' == N[..nindex]
     invariant M1 == Sum(A')
     invariant M2 == Sum(B')
     invariant M1 + M2 == Sum(N')
   {
     if(M1 <= M2)
     {
       M1 := M1 + N[nindex];
       A[aindex] := N[nindex];
       A' := A' + [N[nindex]];
       aindex := aindex + 1;
     } 
     else 
     {
       M2 := M2 + N[nindex];
       B[bindex] := N[nindex];
       B' := B' + [N[nindex]];
       bindex := bindex + 1;
     }
     N' := N' + [N[nindex]];
     nindex := nindex + 1;
   }    
   assert N' == N[..];
}
Aug 17, 2016 at 8:53 AM
Given your problem statement though, perhaps you would prefer this specification:

http://rise4fun.com/Dafny/TfXz
function Sum(A: seq<int>): int
{
  if A == [] then 0
  else A[|A|-1] + Sum(A[..|A|-1])
}

method Stacking(N: array<int>) returns(A: array<int>, B: array<int>, aindex: int, bindex: int) 
   requires N != null
   ensures A != null
   ensures B != null
   ensures 0 <= aindex <= A.Length
   ensures 0 <= bindex <= B.Length
   ensures multiset(A[..aindex]) + multiset(B[..bindex]) == multiset(N[..])
{
   A := new int[N.Length];
   B := new int[N.Length];
   bindex := 0;
   aindex := 0;
   var M1 := 0;
   var M2 := 0;
   var nindex := 0;
   
   ghost var A' := [];
   ghost var B' := [];
   ghost var N' := [];
   
   while(nindex < N.Length)
     invariant 0 <= nindex <= N.Length
     invariant 0 <= aindex <= nindex
     invariant 0 <= bindex <= nindex
     invariant A' == A[..aindex]
     invariant B' == B[..bindex]
     invariant N' == N[..nindex]
     invariant M1 == Sum(A')
     invariant M2 == Sum(B')
     invariant multiset(A') + multiset(B') == multiset(N')
   {
     if(M1 <= M2)
     {
       M1 := M1 + N[nindex];
       A[aindex] := N[nindex];
       A' := A' + [N[nindex]];
       aindex := aindex + 1;
     } 
     else 
     {
       M2 := M2 + N[nindex];
       B[bindex] := N[nindex];
       B' := B' + [N[nindex]];
       bindex := bindex + 1;
     }
     N' := N' + [N[nindex]];
     nindex := nindex + 1;
   }    
   assert N' == N[..];
}
Aug 18, 2016 at 4:37 AM
Thank you for your quick and helpful answer. It makes sense. I had considered using ghost variables, but I couldn't think of how. I also had written the Sum function for sequences as you did, but that didn't work for me because I wasn't using ghost variables.

I agree with your second solution. I actually had the multiset postcondition working, but I removed it from my question to put the focus on where my problem was. I do need the multiset condition for correctness. But I also need to know that the sums are correct, because I want to further use them to prove that this algorithm approximates the load balancing solution within a factor of 2, which will require lots of other things to be added.

Thanks again for your idea to use ghost variable sequences. That is exactly what I need.