[Pharo-project] [GS/SS Beta] serializing object graphs / ordering graph anchors

Henrik Sperre Johansen henrik.s.johansen at veloxit.no
Fri Jun 3 05:03:35 EDT 2011


On 03.06.2011 11:00, Henrik Sperre Johansen wrote:
> On 03.06.2011 10:38, Norbert Hartl wrote:
>>
>> Am 03.06.2011 um 10:20 schrieb Mariano Martinez Peck:
>>
>>>
>>>
>>>>         If I look at the serializers I know the mechanics are simple:
>>>>
>>>>         1. take an object and look it up in a table.
>>>>         2. if the object is in the table it has already been
>>>>         serialized. We take a reference from the table and write it
>>>>         down
>>>>         3. if it isn't in the table it means the object has not
>>>>         been serialized and the contents need to be written. After
>>>>         that the object with a reference representation is written
>>>>         to the table
>>>>         4. while writing the content of an object (inst vars) I
>>>>         encounter simple objects (direct string representation) or
>>>>         complex objects (composite). In the latter case we start
>>>>         over from 1.
>>>>
>>>>         If the object graph is well interconnected than the odds
>>>>         are high that a stack will grow high until the first
>>>>         reference can be written. With a small pier kernel I
>>>>         already matched a stack size of 1000.
>>>>
>>>>
>>>>     In Fuel we  traverse the graph but instead of using a recursion
>>>>     we use our own simple stack implementation where we push and
>>>>     pop objects.
>>>>     But Martin may explaint better than me.
>>>
>>>     I'm interested in details how it works and if it avoids the deep
>>>     stack. Is there additional information about that?
>>>
>>>
>>> In 4. you said "while writing the content of an object (inst vars) I 
>>> encounter simple objects (direct string representation) or complex 
>>> objects (composite). In the latter case we start over from 1.".
>>> So... Imagine object X with two instVars Y and Z. Imagine the method 
>>> of the traverse is called #analye:.  So you do #analize: X.  Then, 
>>> in such method you check whether X has regular pointers to regular 
>>> objects, and if true, you analize them. So in this case you would 
>>> send #analyze: Y and #analize: Z, generating the recursion. Right ?
>>
>> yes, the time between analyse: Y and analyse: Z is dependent on the 
>> graph that is attached to Y. Y can have objects that have further 
>> objects etc. So leaving X aside, Y can be the first object to 
>> serialize and Z the last.
>>>
>>> Ok, instead of sending #analyze: Y and #analize: Z  what we do is 
>>> just to do a push on a stack:   #push: Y  and #push: Z. And then we 
>>> are done with X, so we pop from the stack and we continue with the 
>>> next object (at some point in the future we will pop Y and Z) When 
>>> there are no more objects in the stack it means we are done.
>>
>> How do you write X? I can see that you turn a recursive trace into 
>> depth-by-depth trace but the difficult part is to write X. In X you 
>> need a reference representation of any sort. How do you derive that 
>> from the stack?
>>
>>> For performance (I think) we did FLSimpleStack. You can check also 
>>> FLSimpleStackTest. Both are in Fuel repo.
>>>
>>> Would something like that help for the stack in GemStone?
>>>
>> Yes. With this approach the resulting stack size is much smaller. To 
>> exceed the stack setting you need a graph containing an object which 
>> has a shortest route that is larger than the stack limit. And I think 
>> this is very unlikely :)
>> I would need to change Sixx in order to have it work like this. I 
>> need to think about it.
>>
>> Thanks,
>>
>> Norbert
> Yes, using a specific stack in the way Fuel does relies on there being 
> a separate analyze-phase before the actual serialization of  the objects.
> (Due to wanting to group objects of the same type together when 
> serializing)
>
> You could do the same in a "raw" serializer like SIXX:
> 1) pop first object from stack
> 2) if any of the instvars need to be serialized, and their ids can not 
> be found in cache, push self + missing objects
> 3) if not, serialize self, then goto 1
>
> The key difference is you can not considering an object "done" during 
> an analysis phase, but rather push it back on the stack for 
> reevaluation after its dependencies have been handled in the same way.
>
> So the resulting stack will be much larger than what you see in Fuel, 
> but still kept outside the call stack.
>
> Cheers,
> Henry
Note: This never grows the call stack.
You're basically left with:

[stack isEmpty ] whileFalse: [self serialize: stack pop]
where serialize: either pushes objects and exits, or does actual 
serialization.

Cheers,
Henry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20110603/ca3453ba/attachment-0001.html>


More information about the Pharo-dev mailing list