[Pharo-dev] WorkingSession UUID looks "sketchy"

Sven Van Caekenberghe sven at stfx.eu
Mon Feb 6 15:00:51 EST 2017


First, thanks for the discussion, I think this is good.

The idea/reason behind NeoUUIDGenerator was to get an algorithm that is fully documented and clearly implemented with proper unit tests, all at the Smalltalk level, so that we can get rid of the plugin.

> On 6 Feb 2017, at 18:54, Peter Uhnak <i.uhnak at gmail.com> wrote:
> 
> 1. Regarding WorkingSession
> 
> The WS' comment claims "On each image startup the current session is invalidated and a new session is created.",
> but in reality WS is reset only save&quit, and not on startup... isn't that odd?
> So if image crashes or I am running it headlessly without saving I am actually still on the same session.

That is a bug. Either the documentation has to be changed, but more likely the implementation.

The idea behind the old, simple Session object was crystal clear: create a new, unique (in the sense of #==) Session object on each run of an image (and not when saving).

WorkingSession got more complicated and less clear. People seem to have different expectations from it, and we keep on having trouble with it, which are not a good signs.

Your point (3) also indicates and important issue in division of responsibility.

> 2. Regarding NeoUUID
> 
> My apologies for stating that it was made worse than the /dev/urandom one, now I know why it doesn't really matter due to the other rolling factors.

> However I found some things that you may or may not find interesting (especially on Windows):
> 
> On Windows (10) (with latest 6 image and VM) `Time microsecondClockValue` returns microseconds, but (presumably the system) cannot give precision beyond 1 second - this will imho need a VM fix;

I find that quite hard to believe and I did not know that. Are you really sure ? That would be terrible. A solution might be to use another clock primitive.

> I can also generate about 1.2M UUIDs per second (limited by single core I guess), which means that about every 600-1200 consequential UUIDs will have the same clock value.

I can't follow your reasoning here: if the clock precision would be 1 second, you would get 1.2M consecutive UUIDs with the same clock value, right ?

In the unit tests this is verified (that the time value goes forward on consecutive calls).

> On Linux the microseconds are fine, also I generate only 0.8M UUIDs (it is older machine, so with >1M UUID/sec you will still have time clash),
> now there's about 1-2% probability that the immediately next UUID will have the same clock, but this is countered by the counter. :)

Yes, the clock + the counter + the machine id + random (this is a described in detail in the class comment of NeoUUIDGenerator).

> You are also taking 8 bytes of microsecondClockValue, but the value has only 7 bytes... so 8th byte is fixed to 0.

True.

> 9 & 10 bytes are counter, but 9th bit is rewritten with variant, so the counter is actually 0-255 and not 0-65536.

Yes, but more correctly, the top 2 bits are set to 10, making the range 1 to 2^14 (16384) (instead of 2^16). 

> And finally 11 & 12 are random bits (assuming the seeding isn't broken).

The seeding is one aspect (the quality of the seed), the algorithm is another.
Note that in the current Random class, the seed is initialised using the clock as well.

> So on Windows, the conditional probability that nth and n+256th UUIDs will be identical is imho 1/65536 (assuming they are in the same second, which is easy).

I am pretty/quite sure this is not correct but it would take me much more time to come up with a more correct calculation.

BTW, you also have to define the context of being the same: the same instance of one NeoUUIDGenerator generating the same UUID, or several different generator instances, in the same image, in different images, on different runs, the same or different machines/networks, ... 

> On Linux my understanding is that clash can only happen if NTP adjusts my clock during UUID generation (at which point it is same as Windows).
> 
> Can UUID clash be achieved on Linux if you deploy copies of the same image and let them all generate UUIDs? (It should be again 1/65536).

Same remark as above.

> Regarding the poor seed at startup:
> 
> 1k outside runs of 'NeoUUIDGenerator new nextRandom16' (on a fresh image) gives me only 116 unique values, compared to the expected 990-1000
> In the above it's already second run of the generator, for (random initial) counter, there was only 69 unique values out of 1000.

But that is not proper use of a random generator, you create a new instance every time. You basically test seeding, which is different from random number generator proper.

It seems that is a chicken-egg problem too ;-)

BTW, you can currently provide your own seed too, for example:

Random seed: ('/dev/random' asFileReference binaryReadStreamDo: [ :in | in next: 4 ]) asInteger.

> 3. Chicken and egg question
> 
> How would one bootstrap session's id initialized with just NeoUUID? :) (WorkingSession wants UUID new in initialize, but UUID needs WorkingSession to generate a UUID)

That is correct. I don't understand why WorkingSession needs a UUID, it was not like that before. It also does not seem to be really used.

One solution I can think of would be to make id lazy initialized.

In any case, I am going to try integrating/activating NeoUUIDGenerator again in the latest Pharo 6.

> Peter
> 
>> 
>> Reading from /dev/random is not portable to Windows and tricky too (because it sometimes hangs until there is enough entropy).
>> 
>>>> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort.
>>> 
>>> For all intents and purposes they are considered 100% to be unique.
>>> If you generate two identical V4 UUIDs then either PRNG or seeding is broken (seeding in Pharo's case).
>>> 
>>> Peter
>> 
>> According to https://en.wikipedia.org/wiki/Universally_unique_identifier
>> 
>> <<
>> When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is so close to zero as to be negligible.
>>>> 
>> 
>> Read the last sentence.
>> 
>> So IMO it is certainly not 'broken'. 
>> 
>> Note also that NeoUUID uses different elements, the random part is only one of them.
>> 
>>> On Mon, Feb 06, 2017 at 02:35:37PM +0100, Sven Van Caekenberghe wrote:
>>>> 
>>>>> On 6 Feb 2017, at 14:17, Yuriy Tymchuk <yuriy.tymchuk at me.com> wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I’m using the session id (Smalltalk session id) for my data recording, so I can distinguish if the recorded events came from the same session. The idea is that each time an image is started a new session is created and assigned a new UUID. Now when I started to look on the data I noticed that I have some cases where I have same session IDs with different session creation times (yes a new session is initialized with a current timestamp). The time difference for the sessions with the same UUID and a different timestamp is within 2 hours. Then another thing that I did is to group the data by the timestamp and there are no cases where I have a different ID for the same timestamp, which shows that the timestamp is a more reliable ID. Now I will deal with my data just fine, but maybe we need to look in the implementation why do we get sessions with the same IDs?
>>>>> 
>>>>> Cheers.
>>>>> Uko
>>>> 
>>>> I would be very surprised it would happen with NeoUUIDGenerator (NeoUUIDGenerator next). The idea was to replace UUIDGenerator and the VM plugin by it. That got stalled when there was unforeseen interaction with WorkingSession. I believe that should be solved by now.
>>>> 
>>>> Still, UUIDs are not 100% guaranteed to be unique, they are a (very good) best effort.
>>>> 
>>>> But I agree that if they repeat in such a short time frame, that should be considered a bug.
>>>> 
>>>> Sven
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
> 




More information about the Pharo-dev mailing list