[Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure

Ben Coman btc at openinworld.com
Sun Dec 11 03:07:41 EST 2016


For Case [1] I got an unrelated CI failure [2] for
NeoUUIDGeneratorTests>>#testTwoSameGenerator

Checking this locally by running this a dozen times...
1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp
testTwoSameGenerator ]

Two distinct failures are apparent...

1. Takes longer than expected.
    self assert: (time2 - time1) abs < 100
last=1 ==> a time difference of 109
last=1 ==> a time difference of 102
last=1 ==> a time difference of 155
last=1 ==> a time difference of 105
Interesting that whenever this happens its on the first iteration.

2. Invalid consecutive ids.
    self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10).
last=127 ==> TestFailure: Got 256 instead of 0.
last=828 ==> TestFailure: Got 256 instead of 0.
last=128 ==> TestFailure: Got 256 instead of 0.
last=316 ==> TestFailure: Got 256 instead of 0.
Maybe a problem with rollover?

So instrumenting a bit:
  testTwoSameGenerator: n
    | uuid1 uuid2 consecutiveOk time1 time2 timeOk |
    ....
    "same generator, consequetive counters"
    consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10).
    "same generators, similar clock, within 100 milliseconds"
    time1 := self timeFromUUID: uuid1.
    time2 := self timeFromUUID: uuid2.
    timeOk := (time2 - time1) abs < 100.
    ( timeOk and: [ consecutiveOk ]) ifFalse: [
        Transcript cr;
           show: 'fail n=', (n printPaddedWith: $0 to: 3)  ; tab;
           show: ' time ', ((time2 - time1) printString) , '<100'; tab;
           show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2
at: 10) printString).
    ].
    self assert: consecutiveOk.
    self assert: timeOk.

>From playground running...
1 to: 1000 do: [:n|
    NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n].
Transcript crShow: 'x'.

Note the 'x' was not output due to one of the consecutiveOk or timeOk
asserts triggered each run.
fail n=001 time 104<100 uuid 160=160
fail n=269 time 2<100 uuid 256=0
fail n=001 time 152<100 uuid 166=166
fail n=117 time 3<100 uuid 256=0
fail n=001 time 123<100 uuid 128=128
fail n=278 time 3<100 uuid 256=0
fail n=001 time 145<100 uuid 230=230
fail n=673 time 1340<100 uuid 141=141
fail n=707 time 3<100 uuid 256=0
fail n=001 time 132<100 uuid 73=73
fail n=001 time 117<100 uuid 206=206
fail n=063 time 2<100 uuid 256=0
fail n=345 time 3<100 uuid 256=0

Now a strange thing is that after commenting out those two asserts for
consecutiveOk and timeOk, the fails at n=001 stop happening.

fail n=001 time 141<100 uuid 232=232
x
x
fail n=201 time 3<100 uuid 256=0
fail n=268 time 3<100 uuid 256=0
fail n=359 time 3<100 uuid 256=0
fail n=408 time 2<100 uuid 256=0
fail n=559 time 3<100 uuid 256=0
x
fail n=139 time 3<100 uuid 256=0
fail n=854 time 7<100 uuid 256=0
x
fail n=389 time 3<100 uuid 256=0
fail n=669 time 3<100 uuid 256=0
fail n=876 time 3<100 uuid 256=0
fail n=916 time 3<100 uuid 256=0
fail n=953 time 2<100 uuid 256=0
fail n=954 time 3<100 uuid 256=0
x
fail n=148 time 3<100 uuid 256=0
fail n=741 time 3<100 uuid 256=0
x
fail n=116 time 2<100 uuid 256=0
fail n=480 time 3<100 uuid 256=0
x

After treating the rollover like this...
  consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10).
running the last playground script gives...

(Note the n=001 fail reoccurs after waiting a while between running the
playground script again.)
fail n=001 time 123<100 uuid 110=110
x
fail n=345 time 1287<100 uuid 24=24
fail n=404 time 1313<100 uuid 125=125
x
x
x
x
x
fail n=449 time 1290<100 uuid 24=24
x
x
x
x
x


Thats looking better, so upping to 10000 iterations...
fail n=0001 time 151<100        uuid 160=160
fail n=2977 time 1437<100 uuid 221=221
fail n=5498 time 1454<100 uuid 224=224
fail n=9007 time 1351<100 uuid 239=239
x
fail n=3389 time 1381<100 uuid 223=223
fail n=7768 time 1365<100 uuid 82=82
fail n=9773 time 1395<100 uuid 164=164
x
fail n=3325 time 1448<100 uuid 67=67
fail n=9930 time 1456<100 uuid 103=103
x
fail n=0060 time 2914<100 uuid 159=159
fail n=1829 time 1438<100 uuid 32=32
fail n=7918 time 1436<100 uuid 182=182
x

So the rollover looks cured.

Now adding some more instrumentation to view GC Time and Time To Run
and  (see attached ST file)

0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR
0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00
x
fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00
fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00
x
fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0
x
fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0
fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1
x
x
x
x
x
fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0
x
x
fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1
x


Now I wonder at the class comment saying...
  "the first 8 bytes are the *millisecond* clock value"
but #clockOn: is using *microsecond* clock.
Perhaps "self assert: (time2 - time1) abs < 100"
is meant to be 100ms not 100us ??

cheers -ben

[1] https://pharo.fogbugz.com/f/cases/19438
[2]
https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifact/validationReport.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20161211/47b7cd4f/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NeoUUIDGeneratorTests-testTwoSameGenerator.st
Type: application/squeak-changeset
Size: 1442 bytes
Desc: not available
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20161211/47b7cd4f/attachment.st>


More information about the Pharo-dev mailing list