[Pharo-dev] external semaphores…again

Igor Stasenko siguctua at gmail.com
Sat Oct 12 16:13:57 EDT 2013


On 12 October 2013 11:08, Norbert Hartl <norbert at hartl.name> wrote:

> So, finally it turned out that the culprit is in my own code. I was
> logging exception objects that have a signaler context pointing to the
> socket. This way every connection timeout I added the exception to a
> collection preventing unregistering of external resources.
>
>
Congratulations finding the bug! :)



> Norbert
>
> Am 11.10.2013 um 15:02 schrieb Norbert Hartl <norbert at hartl.name>:
>
> >
> >
> > Am 11.10.2013 um 10:53 schrieb Sven Van Caekenberghe <sven at stfx.eu>:
> >
> >>
> >> On 11 Oct 2013, at 10:24, Norbert Hartl <norbert at hartl.name> wrote:
> >>
> >>> I can report that the behavior is different now. There were two new vm
> releases this week in ppa. The first one didn't work but the second changed
> something. My application was never running that long. It is more than a
> day now having an actual external objects table size of 623 which wasn't
> ever reached before. So I would say that there is chance that this
> particular problem is gone. I monitor this further and I think that this
> wasn't the only problem. But then it is another problem.
> >>
> >> Yeah, but not knowing your application load, 623, which would be about
> 200 sockets (3 semaphores per sockets), is still a lot to be active at the
> same time. Can you in some way invoke a full GC externally, like using
> ZnReadEvalPrintDelegate and see if it eventually drops due to finalization
> ? It should, at least that is what I see.
> >>
> > Yes, that's what I meant. There is always only one outgoing connection
> at a time. Every 15 seconds one request is issued. So you see why expect
> more to find.
> > I'm travelling right now and will have a deeper look after being back
> >
> > Norbert
> >>> Thanks to all of you who've helped solving this. If it comes to the VM
> being the source of problems it is always extra annoying because it is way
> harder to change something there.
> >>>
> >>> Norbert
> >>>
> >>>
> >>> Am 08.10.2013 um 11:27 schrieb Igor Stasenko <siguctua at gmail.com>:
> >>>
> >>>>
> >>>>
> >>>>
> >>>> On 7 October 2013 18:36, Norbert Hartl <norbert at hartl.name> wrote:
> >>>>
> >>>> Am 07.10.2013 um 16:36 schrieb Igor Stasenko <siguctua at gmail.com>:
> >>>>
> >>>>> 1 thing.
> >>>>>
> >>>>> can you tell me what given expression yields for your VM/image:
> >>>>>
> >>>>> Smalltalk vm maxExternalSemaphores
> >>>>>
> >>>>> (if it gives you number less than 10000000 then i think i know what
> is your problem :)
> >>>> It is 10000000
> >>>>
> >>>> What would be the problem if it would be smaller?
> >>>>
> >>>>
> >>>> that just means your VM don't have external object size cap.
> >>>> I changed the implementation to not have hard limit (the arbitrary
> large number
> >>>> is there just to be "compatible" with previous implementation).
> >>>>
> >>>> This means, that you can actually change in your image the check and
> completely ignore limits
> >>>> and just keep growing if it necessary.
> >>>>
> >>>> Now, since you using VM which don't have a limit, but problem still
> persists,
> >>>> it seems like it somewhere else.. :/
> >>>>> i just found that after one merge, my changes get lost
> >>>>> we're just plugged them back in, and it should be back again with
> newer VMs..
> >>>>> but the problem could be more than just semaphores.. if merge broken
> this, it may break
> >>>>> many other things, so we need time to check
> >>>> I try to look at it some more time. I'm using the pharo-vm from the
> launchpad build. Are the changes supposed to be in this one?
> >>>>
> >>>> Norbert
> >>>>
> >>>> Launchpad? You mean ppa? I can't say i remember all the details how
> changes to VM source
> >>>> gets into ppa distro, and how fast they get there. @Damien, can you
> enlighten us?
> >>>>
> >>>>
> >>>> Well, the VM which i downloaded recently using zero-conf script,
> having limit back to 256. Just some merge mistake, which now is fixed..
> means that couple builds will use limit-based implementation.. but then
> >>>> it will be back to my implementaiton.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 7 October 2013 12:31, Norbert Hartl <norbert at hartl.name> wrote:
> >>>>>
> >>>>> Am 07.10.2013 um 11:28 schrieb Henrik Johansen <
> henrik.s.johansen at veloxit.no>:
> >>>>>
> >>>>>>
> >>>>>> On Oct 7, 2013, at 11:16 , Norbert Hartl <norbert at hartl.name>
> wrote:
> >>>>>>
> >>>>>>> As I need an image that runs longer than 24 hours I'm looking at
> some stuff and wonder. Can anybody explain me the rationale for a code like
> this
> >>>>>>>
> >>>>>>> maxExternalSemaphores: aSize
> >>>>>>>   "This method should never be called as result of normal program
> >>>>>>>   execution. If it is however, handle it differently:
> >>>>>>>   - In development, signal an error to promt user to set a bigger
> size
> >>>>>>>   at startup immediately.
> >>>>>>>   - In production, accept the cost of potentially unhandled
> interrupts,
> >>>>>>>   but log the action for later review.
> >>>>>>>
> >>>>>>>   See comment in maxExternalObjectsSilently: why this behaviour is
> >>>>>>>   desirable, "
> >>>>>>>   "Can't find a place where development/production is decided.
> >>>>>>>   Suggest Smalltalk image inProduction, but use an overridable temp
> >>>>>>>   meanwhile. "
> >>>>>>>   | inProduction |
> >>>>>>>   self maxExternalSemaphores
> >>>>>>>       ifNil: [^ 0].
> >>>>>>>   inProduction := false.
> >>>>>>>   ^ inProduction
> >>>>>>>       ifTrue: [self maxExternalSemaphoresSilently: aSize.
> >>>>>>>           self crTrace: 'WARNING: Had to increase size of
> semaphore signal handling table due to many external objects concurrently
> in use';
> >>>>>>>                crTrace: 'You should increase this size at startup
> using #maxExternalObjectsSilently:';
> >>>>>>>                crTrace: 'Current table size: ' , self
> maxExternalSemaphores printString]
> >>>>>>>       ifFalse: ["Smalltalk image"
> >>>>>>>           self error: 'Not enough space for external objects, set
> a larger size at startup!'
> >>>>>>>           "Smalltalk image"]
> >>>>>>>
> >>>>>>> I have reported this once but got no feedback so I like to have a
> few opinions.
> >>>>>>>
> >>>>>>> The report is here: https://pharo.fogbugz.com/f/cases/10839/
> >>>>>>>
> >>>>>>> Norbert
> >>>>>>
> >>>>>> The rationale is that inProduction would be some global setting,
> not yet in place when the code was written…
> >>>>>> Excessive simultaneous Semaphore usage is something that should be
> caught during development, in which case it's better to get an active
> notification, than having it logged somewhere.
> >>>>>
> >>>>> Agreed. But didn't work in my case because it needed roughly 20
> hours and an instable remote backend to trigger the problem. And somehow I
> forgot to install my logger as Transcript so there is no warning message. I
> saw only dead images in the morning.
> >>>>> This not satisfactory but on the other hand this type of problems
> are hard to solve anyway. My feeling tells me there is more to discover.
> Sockets resources get unregistered at finalization time but this didn't
> work either. I would have said that the unlikely situation that no garbage
> collection ran could be the case. But it can't because in
> ExternalSemaphoreTable>>#freedSlotsIn:ratherThanIncreaseSizeTo: there is
> explicit garbage collection.
> >>>>>
> >>>>>> If I've understood correctly, it's moot on newer Pharo VM's, where
> there's no limit on the semtable size, but for legacy code a startup item
> setting size using maxExternalObjectsSilently: (as suggested in the Warning
> text), is still a more proper fix than setting inProduction to true and
> crossing your fingers hoping no signals will be lost during table growth.
> >>>>>
> >>>>> Ah, I didn't know about the risk of loosing signals while resizing
> the table. Thanks for that. Don't get me wrong I wasn't proposing to set
> inProduction in effect. I don't think that automatically growing resource
> management is a proper way to design a system. There is always a range of
> resources you need for your use case. Not setting an upper bound for this
> just covers leaking behavior.
> >>>>>
> >>>>> Norbert
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Igor Stasenko.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Igor Stasenko.
> >>
> >>
> >
>
>
>


-- 
Best regards,
Igor Stasenko.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131012/355f560f/attachment-0002.html>


More information about the Pharo-dev mailing list