[Pharo-dev] External Semaphores leaking in Socket

Norbert Hartl norbert at hartl.name
Mon Oct 7 13:15:49 EDT 2013


Am 07.10.2013 um 14:16 schrieb Henrik Johansen <henrik.s.johansen at veloxit.no>:

> 
> On Oct 6, 2013, at 12:51 , Norbert Hartl <norbert at hartl.name> wrote:
> 
>> I took some time to analyze my current problem with external semaphores. I was just reluctant to raise the limit in my image because I want the problem solved. I logged the management of external semaphores and discovered that the table fills if a connection times out. 
>> 
>> The problem turns out to be in 
>> 
>> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>> 	"Initiate a connection to the given port at the given host 
>> 	address. Waits until the connection is established or time outs."
>> 	self connectNonBlockingTo: hostAddress port: port.
>> 	self
>> 		waitForConnectionFor: timeout
>> 		ifTimedOut: [ConnectionTimedOut signal: 'Cannot connect to '
>> 					, (NetNameResolver stringFromAddress: hostAddress) , ':' , port asString]
>> 
>> When a socket is created three external semaphores are registered in the ExternalSemaphoreTable. If a connection times out the exception is thrown but the Socket still has his resources attached. 
>> 
>> So e.g. in 
>> 
>> SocketStream class>>#openConnectionToHost: hostIP port: portNumber timeout: timeout
>> 	| socket |
>> 	socket := Socket new.
>> 	socket connectTo: hostIP port: portNumber waitForConnectionFor: timeout.
>> 	^self on: socket
>> 
>> it holds locally a socket (with semaphores registered) but on exception time the reference to the socket gets lost and the semaphores stay registered. The only way to unregister is on finalization time but I think it should work better. So I would add a destroy before the exception is raised.
>> 
>> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>> 	"Initiate a connection to the given port at the given host 
>> 	address. Waits until the connection is established or time outs."
>> 	self connectNonBlockingTo: hostAddress port: port.
>> 	self
>> 		waitForConnectionFor: timeout
>> 		ifTimedOut: [
>> 			self destroy.
>> 			ConnectionTimedOut signal: 'Cannot connect to '
>> 					, (NetNameResolver stringFromAddress: hostAddress) , ':' , port asString]
>> 
>> I opened a ticket at https://pharo.fogbugz.com/f/cases/11797 but I'm not sure how I am supposed to provide fixes made against a pharo2.0 image. Probably I should fix this againt 3.0 but then I'm still a 2.0 user :) 
>> 
>> Norbert
> Destroying the socket manually there would only be an optimization, right? 

Yes. Even if it is not necessary I think freeing resources at the earliest possible time is a good thing. The problem with my change is that I have this gut feeling it is not a good one. At least I didn't see my image hang with this change applied. 

> It's still cleaned eventually by a finalization action?

Yes, that is what makes me wonder.  Why do you say "eventually"? What would be the conditions for finalize not being called on an object at garbage collection time?

I log some details and print the external objects size into a lock file. The time of the last hangs the last printed lines where about an external objects size between 230 and 240. I had one test where I interacted with the image and did a manual garbage collect but the external object size was much higher than expected. This is all not very detailed but you get confused very easily hunting problems that are hard to reproduce. 

So I don't have a glue what it is. Need more time to look into it. But I'm running out of ideas what it could be. I'm not even sure it is the external semaphores directly.

Norbert

> Finalization all of a sudden stopping to work is an unlikely culprit of running out of external object space, load above what the size can handle, or actual leak(s) are much more probable. 
> 
> Thankfully, there are a limited number of places leaks could occur, for Socket you have improper uses of acceptFrom:/initialize: (multiple sends to the same socket).
> 
> In a stock image, there's an obscure way that might occur afaict, if class methods acceptFrom:/newXXX are called, and the primitive calls in the new instance's acceptFrom:/initialize: returns a handle with status InvalidSocket instead of nil.
> Then, the external objects will not be unregistered before repeatWithGCIf: clears them by calling them again.
> (No idea if this can ever really happen though...)
> 
> To flush out if such leaks are the cause, it might be useful to adjust initialize:/acceptFrom: with helpful logging/errors, as per
> https://pharo.fogbugz.com/f/cases/5465/Prevent-accidental-semaphore-leaks-when-using-Socket-acceptFrom
> 
> There's a few other users of registerExternalObject:,
> - InputEventFetcher >> #startUp (Used to leak when saving but not quitting due to missing symmetric unregister in #shutDown, should be fixed in 2.0 )
> -  AsynchFile (none uses this anyways right, so didn't bother checking)
> - NetNameResolver >> initializeNetwork - (This potentially leaked if network could ever be in an uninitialized state 2x in a session, but should be fixed in 2.0)
> 
> As long as both the above are fixed in your image, checking if there are non-stock users of registerExternalObject:/initialize:/acceptFrom: is probably the best bet to find trouble spots without waiting for another crash.
> 
> Cheers,
> Henry

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131007/97a2c7af/attachment-0002.html>


More information about the Pharo-dev mailing list