[Pharo-dev] 11635: Race condition in SequenceableCollection>>shuffle

Max Leske maxleske at gmail.com
Tue Oct 1 10:57:24 EDT 2013


bump...


> Thanks for the feedback guys.
> 
> Based on the discussion I propose a different set of changes:
> 
> 1.
> shuffle
> 	^ self shuffleBy: Random new
> 
> 2.
> shuffled
> 	^ self copy shuffle
> 
> 3. remove #shuffledBy: (if you're specific enough to use a custom Random instance you can also create a copy of the collection yourself if you want to)
> 
> 
> I leave the matter of renaming the methods for further discussion. Personally I'm for more intention revealing selectors; I quite like #shuffle and #shuffleInPlace. If we can agree on a different naming scheme we should then apply it to other methods aswell of course (e.g. #sort, #sortInPlace).
> 
> Also open for discussion is the use of Collection>>randomForPicking and Collection>>mutexForPicking in other methods (such as #atRandom). I think it shouldn't be too big a problem to make those methods use individual Random instances and to remove the two class variables from Collection.
> 
> Cheers,
> Max
> 
> 
> On 24.09.2013, at 02:33, btc at openInWorld.com wrote:
> 
>> Nicolas Cellier wrote:
>>> 
>>> The reason why I don't like the explicit "copy", is because this is the
>>> default behaviour.
>>> In place mutation is an (evil) exception.
>>> I'm personnally fine with current conventions:
>>> - sort, shuffle, reverse, replace: are imperative, thus you are telling
>>> collection, sort yourself !
>>> - sorted clearly sounds as a sorted copy for me.
>>> If ever a change was to be made, I would much prefer this to happen on
>>> opposite naming, sort -> inPlaceSort, or sortYourself :)
>>> It's like telling the programmer, hey - a stateful mutation is really what
>>> you want, is it?
>>>   
>> 
>> Conventions should encourage good programming practice.  Although it makes perfect sense on hearing it, I was explicitly not aware of the convention between 'sort' and 'sorted'.  For me, the small difference in letters used between the two doesn't provide a wide enough distinction.  Also, regarding not mutating state, I guess I knew that in the back of my head, but it has not generally been an explicit concern of mine while programming.  Convention should encourage good programming.  So I like the suggestion of inPlaceSort & sortYourself.  As well as being more intention revealing, the less desirable methods are made less attractive by using a longer name. Perhaps another naming convention could be mutatedSort, or mutSort as a hint that the programmer should consider using these in conjunction with mutexes, or at own risk.
>> 
>> cheers -ben
>> 
>> 
>> 
>>> 2013/9/23 Stéphane Ducasse <stephane.ducasse at inria.fr>
>>> 
>>>   
>>>> +1
>>>> Now I understand the point of max with the name sort/sorted because this
>>>> is not explicit to me too :)
>>>> I laways have to think.
>>>> 
>>>> On Sep 23, 2013, at 6:28 PM, Nicolas Cellier <
>>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>>> 
>>>> 100% agree.
>>>> Do it right > do it fast.
>>>> We must not turn usage of the library into something fragile for the sake
>>>> of speed.
>>>> We already make the code itself fragile more often than not in term of
>>>> complexity (harder to understand/test/change).
>>>> 
>>>> Especially, introduction of shared mutable states (global, class vars,
>>>> singleton or any other form) should ring an alarm in reviewers head
>>>> (This is some very old Squeak code in this case, so Squeakers are to
>>>> blame, but we're all in same bath).
>>>> 
>>>> 
>>>> 
>>>> 2013/9/23 Henrik Johansen <henrik.s.johansen at veloxit.no>
>>>> 
>>>>     
>>>>> On Sep 23, 2013, at 3:33 , Max Leske <maxleske at gmail.com> wrote:
>>>>> 
>>>>>       
>>>>>>> If a user is going to modify the same object concurrently, he/she
>>>>>>>           
>>>>> takes care of mutual exclusion.
>>>>>       
>>>>>> Agreed. BUT: the Random object used by these methods is the same one
>>>>>>         
>>>>> that is used by #atRandom for instance, hence the race condition. There is
>>>>> no way anyone can safely use these methods without the mutex, single
>>>>> threaded or not. Calls to methods using that same Random object can be all
>>>>> over the place and also in the base system.
>>>>> 
>>>>> 
>>>>> It seems to me an existing Random instance is used in this case mostly*
>>>>> for performance.
>>>>> One could argue that since the Random in this case is used for a bulk
>>>>> operation, for which the object creation cost is largely amortized for
>>>>> collection sizes > 20, it's acceptable to instead use Random new by
>>>>> default, which wouldn't suffer from the same race conditions.
>>>>> While still slower than a mutex-protected version for single-threaded
>>>>> code, it would also scale correctly if the users (and vm) are actually
>>>>> multi-threaded.
>>>>> 
>>>>> [#(1 2  3 4 5 6 7 8 9 10 11 12) shuffle] bench '208,000 per second.'
>>>>>  '222,000 per second.' '223,000 per second.'
>>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleWithMutex] bench '188,000 per
>>>>> second.'  '186,000 per second.''184,000 per second.'
>>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleNewRandom] bench '167,000 per
>>>>> second.' '166,000 per second.'  '167,000 per second.'
>>>>> 
>>>>> Cheers,
>>>>> Henry
>>>>> 
>>>>> * Low seed entropy is another issue, but if purely random shuffling is a
>>>>> critical requirement, one shouldn't use the default Random generator
>>>>> anyways...
>>>>> 
>>>>> 
>>>>> 
>>>>>       
>>>>     
>>> 
>>>   
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131001/c8251727/attachment-0002.html>


More information about the Pharo-dev mailing list