Instability of verifications

Nov 30, 2015 at 4:39 PM
Hello:

I have some questions about differences in verification results when changing some little things or some environmental circumstances. Trying to reproduce VisualStudio conditions when running Dafny in console, I copied the options from here: /timeLimit:10 /autoTriggers:1 /compile:0, omitting /vcsCores:x (where x is one less than the number of processors in the machine) because I thought that was not relevant.

It was relevant indeed. That was what made the results differ (I had always been using /timeLimit:10). There are other apparently innocent modifications which change the results substantially.

An example: http://rise4fun.com/Dafny/Ai2g. This program gets verified in console with the options above (except /vcsCores:7) (Windows or Linux) and in rise4fun.
$ time dafny /compile:0 /timeLimit:10 /autoTriggers:1 er5.7.dfy 
Dafny program verifier version 1.9.6.21012, Copyright (c) 2003-2015, Microsoft.

Dafny program verifier finished with 3 verified, 0 errors

real    0m12.863s
user    0m12.712s
sys 0m0.064s
However, it does not work if I...
  1. Add /vcsCores:7 (or try in VisualStudio):
    $ time dafny /compile:0 /timeLimit:10 /autoTriggers:1 /vcsCores:7 er5.7.dfy 
    Dafny program verifier version 1.9.6.21012, Copyright (c) 2003-2015, Microsoft.
    er5.7.dfy(9,6): Verification of 'Impl$$_module.__default.LemaProdEscalar' timed out after 10 seconds
    er5.7.dfy(17,6): Timed out on BP5003: A postcondition might not hold on this return path.
    er5.7.dfy(13,27): Related location: This is the postcondition that might not hold.
    Execution trace:
        (0,0): anon0
        (0,0): anon3_Else
    er5.7.dfy(25,49): Timed out on: assertion violation
    Execution trace:
        (0,0): anon0
        (0,0): anon3_Else
    
    Dafny program verifier finished with 2 verified, 0 errors, 1 time out
    
    real    0m32.153s
    user    0m32.128s
    sys     0m0.104s
    
  2. Execute Dafny at the same time as a video transcoding program (or any other CPU demanding software). The same output.
  3. Translate the identifiers to English (well, change them in general): http://rise4fun.com/Dafny/uJ96.
    $ time dafny /compile:0 /timeLimit:10 /autoTriggers:1 er5.7.dfy 
    Dafny program verifier version 1.9.6.21012, Copyright (c) 2003-2015, Microsoft.
    er5.7.dfy(9,6): Verification of 'Impl$$_module.__default.DotProductLemma' timed out after 10 seconds
    er5.7.dfy(25,49): Timed out on: assertion violation
    Execution trace:
        (0,0): anon0
        (0,0): anon3_Else
    
    Dafny program verifier finished with 2 verified, 0 errors, 1 time out
    
    real    0m22.673s
    user    0m22.468s
    sys     0m0.072s
    
    
I suppose that the first two changes come from the difference between the timeout, measured in real time, and the effective computing time for each prover instance, which may be shorter due to CPU sharing. I fear that is inherent to the concept of timeout and so unavoidable. No idea about the third problem.

This program is not a difficult one. Is there anything in the file I submit that could explain its quite long duration?
Dec 1, 2015 at 9:35 AM
I don't think that that size of the file is particularly relevant. What Dafny is doing is searching for a proof by starting from one set of facts and then applying axioms to deduce more facts, hoping eventually to produce a proof. The time it takes depends on the order it chooses to apply the axioms, and the number of axioms that can be applied (among other things).

One thing to note is that arithmetic operations do not play well with the matching algorithm that the SMT solver uses. I believe that Dafny usually works better if you create a function to encapsulate any arithmetic operations.
Dec 1, 2015 at 9:49 AM
Here is a version that has the lemmas split out and the sequence operations given names inside functions. It seems to verify quickly. I hope I have preserved the meaning of your example.

http://rise4fun.com/Dafny/aZEs
Dec 1, 2015 at 10:07 AM
Edited Dec 1, 2015 at 10:16 AM
(The codeplex edit function incorrectly escapes "+" symbols)
Dec 1, 2015 at 10:15 AM
There is also another style of proof supported by Dafny, "calc", which I find has more predictable performance. I think this is because the scope of the axioms is very restricted.
function ProdEscalar(v : seq<int>, w : seq<int>) : int
    requires |v| == |w|
{
    if v == []
    then 0
    else v[0] * w[0] + ProdEscalar(v[1..], w[1..])
}

function last(v:seq<int>) : int
  requires v != []
{ v[|v|-1] }

function first(v:seq<int>) : int
  requires v != []
{ v[0] }

function rest(v:seq<int>) :seq<int>
  requires v != []
{ v[..|v|-1] }

function tail(v:seq<int>) :seq<int>
  requires v != []
{ v[1..] }

lemma LastOfTailIsLast(v:seq<int>)
  requires |v| > 1
  ensures last(tail(v)) == last(v)
{}

lemma FirstOfRestIsFirst(v:seq<int>)
  requires |v| > 1
  ensures first(rest(v)) == first(v) 
{}

lemma RestTailCommute(v:seq<int>)
  requires |v| > 1
  ensures rest(tail(v)) == tail(rest(v));
{}

lemma LemaProdEscalar(v : seq<int>, w : seq<int>)
    requires |v| == |w|
    requires v != []
    ensures ProdEscalar(v, w) == ProdEscalar(rest(v), rest(w)) + last(v) * last(w)
{
    if (|v| == 1) { }   else {
    calc ==
    {
        ProdEscalar(v, w);
        // definition ProdEscalar 
        first(v)*first(w)+ProdEscalar(tail(v), tail(w));
        {LemaProdEscalar(tail(v), tail(w));}
        first(v)*first(w)+ProdEscalar(rest(tail(v)), rest(tail(w)))+last(tail(v))*last(tail(w));
        {LastOfTailIsLast(v); LastOfTailIsLast(w);}
        first(v)*first(w)+ProdEscalar(rest(tail(v)), rest(tail(w)))+last(v)*last(w);
        {RestTailCommute(v); RestTailCommute(w);}
        first(v)*first(w)+ProdEscalar(tail(rest(v)), tail(rest(w)))+last(v)*last(w);
            {FirstOfRestIsFirst(v); FirstOfRestIsFirst(w);}
        first(rest(v))*first(rest(w))+ProdEscalar(tail(rest(v)), tail(rest(w)))+last(v)*last(w);
        // definition ProdEscalar
        ProdEscalar(rest(v), rest(w))+last(v)*last(w);
    }}
}
http://rise4fun.com/Dafny/MhPqH
Dec 7, 2015 at 7:19 PM
Thanks a lot. Your version follows the same idea and it's stable against all the circumstances I had written in the post (verification time doesn't change). It's important to know Dafny's preferences.

I didn't find the original example easy because of its size but because it only involves a definition and the induction hypothesis, apart from the subsequences, which seem to be the tricky part.

With that variation of the calc syntax and without extra definitions, I have rewritten my original example. I verifies quickly; however, the translation of identifiers makes the verification time increase from 0.86 seconds to 6.709. I don't know why but at least it confirms that giving names and splitting lemmas really helps.