[Pharo-dev] external semaphores…again

Norbert Hartl norbert at hartl.name
Fri Oct 11 09:02:52 EDT 2013



Am 11.10.2013 um 10:53 schrieb Sven Van Caekenberghe <sven at stfx.eu>:

> 
> On 11 Oct 2013, at 10:24, Norbert Hartl <norbert at hartl.name> wrote:
> 
>> I can report that the behavior is different now. There were two new vm releases this week in ppa. The first one didn't work but the second changed something. My application was never running that long. It is more than a day now having an actual external objects table size of 623 which wasn't ever reached before. So I would say that there is chance that this particular problem is gone. I monitor this further and I think that this wasn't the only problem. But then it is another problem.
> 
> Yeah, but not knowing your application load, 623, which would be about 200 sockets (3 semaphores per sockets), is still a lot to be active at the same time. Can you in some way invoke a full GC externally, like using ZnReadEvalPrintDelegate and see if it eventually drops due to finalization ? It should, at least that is what I see.
> 
Yes, that's what I meant. There is always only one outgoing connection at a time. Every 15 seconds one request is issued. So you see why expect more to find.
I'm travelling right now and will have a deeper look after being back

Norbert
>> Thanks to all of you who've helped solving this. If it comes to the VM being the source of problems it is always extra annoying because it is way harder to change something there.
>> 
>> Norbert
>> 
>> 
>> Am 08.10.2013 um 11:27 schrieb Igor Stasenko <siguctua at gmail.com>:
>> 
>>> 
>>> 
>>> 
>>> On 7 October 2013 18:36, Norbert Hartl <norbert at hartl.name> wrote:
>>> 
>>> Am 07.10.2013 um 16:36 schrieb Igor Stasenko <siguctua at gmail.com>:
>>> 
>>>> 1 thing.
>>>> 
>>>> can you tell me what given expression yields for your VM/image:
>>>> 
>>>> Smalltalk vm maxExternalSemaphores
>>>> 
>>>> (if it gives you number less than 10000000 then i think i know what is your problem :)
>>> It is 10000000
>>> 
>>> What would be the problem if it would be smaller?
>>> 
>>> 
>>> that just means your VM don't have external object size cap.
>>> I changed the implementation to not have hard limit (the arbitrary large number
>>> is there just to be "compatible" with previous implementation).
>>> 
>>> This means, that you can actually change in your image the check and completely ignore limits 
>>> and just keep growing if it necessary. 
>>> 
>>> Now, since you using VM which don't have a limit, but problem still persists,
>>> it seems like it somewhere else.. :/ 
>>>> i just found that after one merge, my changes get lost
>>>> we're just plugged them back in, and it should be back again with newer VMs..
>>>> but the problem could be more than just semaphores.. if merge broken this, it may break 
>>>> many other things, so we need time to check
>>> I try to look at it some more time. I'm using the pharo-vm from the launchpad build. Are the changes supposed to be in this one?
>>> 
>>> Norbert
>>> 
>>> Launchpad? You mean ppa? I can't say i remember all the details how changes to VM source
>>> gets into ppa distro, and how fast they get there. @Damien, can you enlighten us?
>>> 
>>> 
>>> Well, the VM which i downloaded recently using zero-conf script, having limit back to 256. Just some merge mistake, which now is fixed.. means that couple builds will use limit-based implementation.. but then 
>>> it will be back to my implementaiton.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7 October 2013 12:31, Norbert Hartl <norbert at hartl.name> wrote:
>>>> 
>>>> Am 07.10.2013 um 11:28 schrieb Henrik Johansen <henrik.s.johansen at veloxit.no>:
>>>> 
>>>>> 
>>>>> On Oct 7, 2013, at 11:16 , Norbert Hartl <norbert at hartl.name> wrote:
>>>>> 
>>>>>> As I need an image that runs longer than 24 hours I'm looking at some stuff and wonder. Can anybody explain me the rationale for a code like this
>>>>>> 
>>>>>> maxExternalSemaphores: aSize 
>>>>>>    "This method should never be called as result of normal program
>>>>>>    execution. If it is however, handle it differently:
>>>>>>    - In development, signal an error to promt user to set a bigger size
>>>>>>    at startup immediately.
>>>>>>    - In production, accept the cost of potentially unhandled interrupts,
>>>>>>    but log the action for later review.
>>>>>>    
>>>>>>    See comment in maxExternalObjectsSilently: why this behaviour is
>>>>>>    desirable, "
>>>>>>    "Can't find a place where development/production is decided.
>>>>>>    Suggest Smalltalk image inProduction, but use an overridable temp
>>>>>>    meanwhile. "
>>>>>>    | inProduction |
>>>>>>    self maxExternalSemaphores
>>>>>>        ifNil: [^ 0].
>>>>>>    inProduction := false.
>>>>>>    ^ inProduction
>>>>>>        ifTrue: [self maxExternalSemaphoresSilently: aSize.
>>>>>>            self crTrace: 'WARNING: Had to increase size of semaphore signal handling table due to many external objects concurrently in use';
>>>>>>                 crTrace: 'You should increase this size at startup using #maxExternalObjectsSilently:';
>>>>>>                 crTrace: 'Current table size: ' , self maxExternalSemaphores printString]
>>>>>>        ifFalse: ["Smalltalk image"
>>>>>>            self error: 'Not enough space for external objects, set a larger size at startup!'
>>>>>>            "Smalltalk image"]
>>>>>> 
>>>>>> I have reported this once but got no feedback so I like to have a few opinions.
>>>>>> 
>>>>>> The report is here: https://pharo.fogbugz.com/f/cases/10839/
>>>>>> 
>>>>>> Norbert
>>>>> 
>>>>> The rationale is that inProduction would be some global setting, not yet in place when the code was written…
>>>>> Excessive simultaneous Semaphore usage is something that should be caught during development, in which case it's better to get an active notification, than having it logged somewhere.
>>>> 
>>>> Agreed. But didn't work in my case because it needed roughly 20 hours and an instable remote backend to trigger the problem. And somehow I forgot to install my logger as Transcript so there is no warning message. I saw only dead images in the morning. 
>>>> This not satisfactory but on the other hand this type of problems are hard to solve anyway. My feeling tells me there is more to discover. Sockets resources get unregistered at finalization time but this didn't work either. I would have said that the unlikely situation that no garbage collection ran could be the case. But it can't because in ExternalSemaphoreTable>>#freedSlotsIn:ratherThanIncreaseSizeTo: there is explicit garbage collection. 
>>>> 
>>>>> If I've understood correctly, it's moot on newer Pharo VM's, where there's no limit on the semtable size, but for legacy code a startup item setting size using maxExternalObjectsSilently: (as suggested in the Warning text), is still a more proper fix than setting inProduction to true and crossing your fingers hoping no signals will be lost during table growth.
>>>> 
>>>> Ah, I didn't know about the risk of loosing signals while resizing the table. Thanks for that. Don't get me wrong I wasn't proposing to set inProduction in effect. I don't think that automatically growing resource management is a proper way to design a system. There is always a range of resources you need for your use case. Not setting an upper bound for this just covers leaking behavior.
>>>> 
>>>> Norbert
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best regards,
>>>> Igor Stasenko.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Best regards,
>>> Igor Stasenko.
> 
> 




More information about the Pharo-dev mailing list