[Pharo-users] Voluntarily cancelling requests ("applying an expiration date")

Sven Van Caekenberghe sven at stfx.eu
Mon Feb 10 09:13:21 EST 2020


Hi Holger,

That is a complicated story ;-)

But, you running out of external semaphores means that you are using too many sockets, are not closing/releasing them (in time) and/or your GC does not run enough to keep up (it is easy to deplete the external semaphore table without the GC kicking in).

You must have a loop somewhere that goes too fast and maybe does not clean up properly while doing so.

YMMV, but I do similar things -- implement/offer REST services that call other REST/network services, all with timeouts, in several variations, for years, and I do not have problems like you describe.

I would suggest enabling logging so that you can see better where the allocations happen and if your cleanup code does its work.

Sven

PS: Zinc logging is easy, just do 

  ZnLogEvent logToTranscript

> On 9 Feb 2020, at 16:31, Holger Freyther <holger at freyther.de> wrote:
> 
> tl;dr: I am searching for a pattern (later code) to apply expiration to operations.
> 
> 
> 
> Introduction:
> 
> One nice aspect of Mongodb is that it has built-in data distribution[1] and configurable retention[2]. The upstream project has a document called "Server Discovery and Monitoring (SDAM)", defining how a client should behave. Martin Dias is currently implementing SDAM in MongoTalk/Voyage and I took it on a test drive.
> 
> 
> Behavior:
> 
> My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new REST requests arrives I am using Voyage (e.g. >>#selectOne:) which will use MongoTalk. The MongoTalk code needs to select the right server. It's currently done by waiting for a result.
> 
> Next I started to simulate database outages. The rest clients retried when not receiving a result within two seconds (no back-off/jitter). What happened was roughly the following:
> 
> 
> [
> 	1.) ZnServer accepts a new connection
> 	2.) MongoTalk waits for a server longer than 2s
> 	"nothing.. the above waits..."
> ] repeat.
> 
> 
> 
> 
> Problem:
> 
> What happened next surprised me. I expected to have a bad time when the database recovers and all the stale (remember the REST clients already gave up and closed the socket) requests will be answered. Instead my image crashed early in my test as the ExternalSemaphoreTable was full.
> 
> Let's focus on the timeout behavior and discuss the existence of the ExternalSemaphoreTable and the number of entries separately at a different time.
> 
> 
> 
> 
> To me the two main problems I see are:
> 
> 
> 1.) Lack of back-pressure for ZnManagingMultiThreadedServer
> 
> 2.) Disconnect of time between the Application Layer handling REST is allowed to take and down the stack how long MongoTalk may sleep and wait for a server.
> 
> 
> The first item is difficult. Even answering HTTP 500 when we are out of space in the ExternalSemaphore is difficult... Let's ignore this for now as well.
> 
> 
> 
> 
> 
> 
> What I look for:
> 
> 
> 1.) Voluntarily Timeout
> 
> Inside my Application code I would like to tag an operation with a timeout. This means everything that is done should complete within X seconds. It can be used on a voluntarily basis.
> 
> 
>>> #lookupPerson
> 
>   "We expect all database operations to complete within two seconds"
>   person := ComputeContext current withTimeout: 2 seconds during: [
> 	repository selectOne: Person where: [:each name | ...],
>   ].
> 
> 
> 
> MongoTalk>>stuff
>  "See if the outer context timeout has expired and signal. E.g. before writing
>  something into the socket to keep consistency."
>  ComputeContext current checkExpired.
> 
> 
> MongoTalk>>other
>  "Sleep for up to the remaining time out
>  (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [
>     SomethingExpired signal.
>  ]
> 
> 
> 
> 2.) Cancellation
> 
> 
> More difficult to write in pseudo code (without TaskIt?). In my above case we are waiting for the database to be ready while the client already closed the file descriptor. Now we are not able to see this until much later.
> 
> The idea is that in addition to the timeout we can pass a block that is called when an operation should be cancelled and the ComputeContext can be checked if something has been cancelled?
> 
> 
> 
> 
> The above takes inspiration from Go's context package[3]. In Go the context should be passed as parameter but we could make it a Process variable?
> 
> 
> 
> 
> 
> Question:
> 
> How do you handle this in your systems? Is this something we can consider for Pharo9? 
> 
> 
> 
> thanks
> 	holger
> 
> 
> 
> 
> 
> 
> 
> 
> [1] It has the concept of "replicationSet" and works by having a primary, secondary and arbiters running.
> [2] For every write one can configure if the write should succeed immediately (before it is even on disk) or when it has been written to multiple stores (e.g. majority, US and EMEA)
> [3] https://golang.org/pkg/context/
> 
> 
> 




More information about the Pharo-users mailing list