[Pharo-project] Fuel - a fast object deployment tool

Martin Dias tinchodias at gmail.com
Mon Jun 20 00:48:40 EDT 2011


I think the substitution should be done during the graph trace. Following
with the example, if a proxy replaces an object, the proxy represents a
subgraph that is appended and so it should be traced.

For that we should keep track of the substitutions. I'm not sure how complex
is that but is think it's not so difficult.

Seems to be a great idea, we have to try it. I like that avoids writing inst
var names as strings. I have no idea if with *slots* implemented then we
will be able to return inst vars as first-class objects... but anyway this
looks like the a nice solution.

So, we have this as a pending issue as well as the id virtualization. Thanks
for the ideas and the discussion!

Martin

On Fri, Jun 17, 2011 at 7:09 PM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

> 2011/6/17 Eliot Miranda <eliot.miranda at gmail.com>:
> >
> >
> > On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
> > <nicolas.cellier.aka.nice at gmail.com> wrote:
> >>
> >> 2011/6/17 Eliot Miranda <eliot.miranda at gmail.com>:
> >> >
> >> >
> >> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <tinchodias at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi Eliot,
> >> >> I am very happy to read your mail.
> >> >>
> >> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
> >> >> <eliot.miranda at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Martin & Mariano,
> >> >>>     regarding filtering.  Yesterday my colleague Yaron and I
> >> >>> successfully
> >> >>> finished our port of Fuel to Newspeak and are successfully using it
> to
> >> >>> save
> >> >>> and restore our data sets; thank you, its a cool framework.  We had
> to
> >> >>> implement two extensions, the first of which the ability to save and
> >> >>> restore
> >> >>> Newspeak classes, which is complex because these are instantiated
> >> >>> classes
> >> >>> inside instantiated Newspeak modules, not static Smalltalk classes
> in
> >> >>> the
> >> >>> Smalltalk dictionary.  The second extension is the ability to map
> >> >>> specific
> >> >>> objects to nil, to prune objects on the way out.  I want to discuss
> >> >>> this
> >> >>> latter extension.
> >> >>> In our data set we have a set of references to objects that are
> >> >>> logically
> >> >>> not persistent and hence not to be saved.  I'm sure that this will
> be
> >> >>> a
> >> >>> common case.  The requirement is for the pickling system to prune
> >> >>> certain
> >> >>> objects, typically by arranging that when an object graph is
> pickled,
> >> >>> references to the pruned objects are replaced by references to nil.
> >> >>>  One way
> >> >>> of doing this is as described below, by specifiying per-class lists
> of
> >> >>> instance variables whose referents shoudl not be saved.  But this
> can
> >> >>> be
> >> >>> clumsy; there may be references to objects one wants to prune from
> >> >>> e.g. more
> >> >>> than one class, in which case one may have to provide multiple lists
> >> >>> of the
> >> >>> relevant inst vars; there may be references to objects one wants to
> >> >>> prune
> >> >>> from e.g. collections (e.g. sets and dictionaries) in which case the
> >> >>> instance variable list approach just doesn't work.
> >> >>> Here are two more general schemes.  VFirst, most directly, Fuel
> could
> >> >>> provide two filters, implemented in the default mapper, or the core
> >> >>> analyser.  One is a set of classes whose instances are not to be
> >> >>> saved.  Any
> >> >>> reference to an instance of a class in the toBePrunedClasses set is
> >> >>> saved as
> >> >>> nil.  The other is a set of instances that are not to be saved, and
> >> >>> also any
> >> >>> reference to an instance in the toBePruned set is saved as nil.  Why
> >> >>> have
> >> >>> both?  It can be convenient and efficient to filter by class (in our
> >> >>> case we
> >> >>> had many instances of a specific class, all of which should be
> >> >>> filtered, and
> >> >>> finding them could be time consuming), but filtering by class can be
> >> >>> too
> >> >>> inflexible, there may indeed be specific instances to exclude (thing
> >> >>> for
> >> >>> example of part of the object graph that functions as a cache;
> pruning
> >> >>> the
> >> >>> specific objects in the cache is the right thing to do; pruning all
> >> >>> instances of classes whose instances exist in the cache may prune
> too
> >> >>> much).
> >> >>> As an example here's how we implemented pruning.  Our system is
> called
> >> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
> >> >>> FLMapper subclass: #FLGlueMapper
> >> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
> >> >>> modelClasses'
> >> >>> classVariableNames: ''
> >> >>> poolDictionaries: ''
> >> >>> category: 'Fuel-Core-Mappers'
> >> >>> It accepts newspeak objects and filters instances in the
> >> >>> prunedObjectsClasses set, and as a side-effect collects certain
> >> >>> classes that
> >> >>> we need in a manifest:
> >> >>> FLGlueMapper>>accepts: anObject
> >> >>> "Tells if the received object is handled by this analyzer.  We want
> to
> >> >>> hand-off
> >> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
> >> >>> want
> >> >>> to record other model classes.  We want to filter-out instances of
> any
> >> >>> class
> >> >>> in prunedObjectClasses."
> >> >>> ^anObject isBehavior
> >> >>> ifTrue:
> >> >>> [(self isInstantiatedNewspeakClass: anObject)
> >> >>> ifTrue: [true]
> >> >>> ifFalse:
> >> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
> >> >>> [modelClasses add: anObject].
> >> >>> false]]
> >> >>> ifFalse:
> >> >>> [prunedObjectClasses includes: anObject class]
> >> >>> It prunes by mapping instances of the prunedObjectClasses to a
> special
> >> >>> cluster.  It can do this in visitObject: since any newspeak objects
> it
> >> >>> is
> >> >>> accepting will be visited in its visitClassOrTrait: method (i.e.
> it's
> >> >>> implicit that all arguments to visitObjects: are instances of the
> >> >>> prunedObjectsClasses set).
> >> >>> FLGlueMapper>>visitObject: anObject
> >> >>> analyzer
> >> >>> mapAndTrace: anObject
> >> >>> to: FLPrunedObjectsCluster instance
> >> >>> into: analyzer clustersWithBaselevelObjects
> >> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
> >> >>> cluster
> >> >>> that maps its objects to nil:
> >> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
> >> >>> instanceVariableNames: ''
> >> >>> classVariableNames: ''
> >> >>> poolDictionaries: ''
> >> >>> category: 'Fuel-Core-Clusters'
> >> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
> >> >>> super serialize: nil on: aWriteStream
> >> >>>
> >> >>> So this would generalize by the analyser having an e.g.
> >> >>> FLPruningMapper
> >> >>> as the first mapper, and this having a prunedObjects and a
> >> >>> priunedObjectClasses set and going something like this:
> >> >>> FLPruningMapper>>accepts: anObject
> >> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses
> includes:
> >> >>> anObject class]
> >> >>> FLPruningMapper >>visitObject: anObject
> >> >>> analyzer
> >> >>> mapAndTrace: anObject
> >> >>> to: FLPrunedObjectsCluster instance
> >> >>> into: analyzer clustersWithBaselevelObjects
> >> >>> and then one would provide accessors in FLSerialzer and/or
> FLAnalyser
> >> >>> to
> >> >>> add objects and classes to the prunedObjects and prunedObjectClasses
> >> >>> set.
> >> >>> For efficiency one could arrange that the FLPruningMapper was not
> >> >>> added
> >> >>> to the sequence of mappers unless and until objects or classes were
> >> >>> added
> >> >>> to the prunedObjects and prunedObjectClasses set.
> >> >>
> >> >> Excellent. I love the botanical metaphor of pruning! Of course we can
> >> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
> >> >>
> >> >> We are also interested in pruning objects but not necessarily
> replacing
> >> >> them by nil, but for another user defined objects. For example
> proxies.
> >> >> We
> >> >> can extend the pruning stuff for doing that.
> >> >
> >> > That was an idea Yaron came up with.  That instead of
> >> > using fuelIgnoredInstanceVariableNames one uses e.g.
> >> > Object>>objectToSerialize
> >> >     ^self
> >> > and then if one wants to prune specific inst vars in MyClass one
> >> > implements
> >> > MyClass>>objectToSerialize
> >> >     ^self shallowCopy prepareForSerialization
> >>
> >> Hi Eliot,
> >>
> >> I'm not convinced by the shallowCopy solution, except for the simple
> >> structures.
> >> If object graph is complex (have share nodes, loops, ...) then you
> >> gonna end up in a replication problem equivalent to the one Fuel is
> >> trying to solve.
> >
> > The assumption is that the analyser would create a maximum of one proxy
> per
> > object in the graph (default, no proxy) and that it would map objects
> with
> > proxies to their proxies.  So if proxies only nilled out inst vars I
> don't
> > see a problem.  What's attractive about this is that it provides a
> general
> > solution to a couple of problems, a) how to replace a class of objects by
> > some substitute (e.g. nil), b) how to prune state that needn't be saved.
>  It
> > is also conceptually simple; one just creates a proxy instance; no
> defining
> > metadata, such as inst var names, and hence the code is always up-to-date
> > (e.g. a class redefine won't automatically uncover renamed inst vars in
> > serialization metadata).
>
> Ah, OK, it occurs after the graph analysis, which I did not catch at first
> read.
> Now I understand better.
>
> Nicolas
>
> >>
> >> Nicolas
> >>
> >> > MyClass>>prepareForSerialization
> >> >     instVarIDontWantToSerialize := nil.
> >> >     ^self
> >> > and for objects one doesn't want to serlalize one implements
> >> > MyNotToBeSerializedClass>>objectToSerialize
> >> >     ^nil
> >> > So its more general.  But I would pass the analyser in as an argument,
> >> > which
> >> > would allow things like
> >> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
> >> >     ^(anFLAnalyser shouldPrune: self)
> >> >         ifFalse: [self]
> >> >         ifTrue: [nil]
> >> > which would of course be the default in Object:
> >> > Object>>objectToSerializeIn: anFLAnalyser
> >> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
> >> >
> >> >>
> >> >>
> >> >>>
> >> >>> I think both Yaron and I feel the Fuel framework is comprehensible
> and
> >> >>> flexible.  We enjoyed using it and while we took two passes at
> coming
> >> >>> up
> >> >>> with the pruning scheme we liked (our first was based on not
> >> >>> serializing
> >> >>> specific ins vars and was much more complex than our second, based
> on
> >> >>> pruning instances of specific classes) we got there quickly and will
> >> >>> very
> >> >>> little frustration along the way.  Thank you very much.
> >> >>
> >> >> :-) thank you!
> >> >>
> >> >>>
> >> >>> Finally, a couple of things.  First, it may be more flexible to
> >> >>> implement
> >> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying
> to
> >> >>> override certain parts of the mapping framework an implementation
> can
> >> >>> access
> >> >>> the analyser to find existing clusters, e.g.
> >> >>> MyClass>>fuelClusterIn: anFLAnalyser
> >> >>> ^self shouldBeInASpecialCluster
> >> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
> >> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
> >> >>> This makes it easier to find a specific unique cluster to handle a
> >> >>> group
> >> >>> of objects specially.
> >> >>
> >> >> I can't imagine a concrete example but I see that it is more
> >> >> flexible...
> >> >> the cluster obtained via double dispatch can be anything polymorphic
> >> >> with
> >> >> MySpecialCluster... that's the point?
> >> >
> >> > To be honest I'm not sure.  But passing in the analyser in things like
> >> > fuelCluster or objectToSerialize is I think a good idea as it provides
> a
> >> > convenient communication path which in turn provides considerable
> >> > flexibility.
> >> >>
> >> >>
> >> >>>
> >> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would be
> >> >>> nice
> >> >>> to know a) are these byte values or general integer values, i.e. can
> >> >>> there
> >> >>> be more than 256 types of cluster?, and b) is there any meaning to
> the
> >> >>> ids?
> >> >>>  For example, are clusters ordered by id, or is this just an integer
> >> >>> tag?
> >> >>>  Also, some class-side code to assign an unused id would be nice.
> >> >>> You might think of virtualizing the id scheme.  For example, if
> >> >>> FLCluster
> >> >>> maintained a weak array of all its subclasses then the id of a
> cluster
> >> >>> could
> >> >>> be the index in the array, and the array could be cleaned up
> >> >>> occasionally.
> >> >>>  Then each fuel serialization could start with the list of cluster
> >> >>> class
> >> >>> names and ids, so that specific values of ids are specific to a
> >> >>> particular
> >> >>> serialization.
> >> >>
> >> >> I do agree, these ids are an heritage from the first prototypes of
> >> >> fuel,
> >> >> they should be revised. a) yes, it is encoded in only one byte; b)
> just
> >> >> an
> >> >> integer tag, the only purpose of the id was for decoding fast: read a
> >> >> byte
> >> >> and then look in a dictionary for the corresponding cluster instance.
> >> >> We
> >> >> could even store the cluster class name but that's inefficient.
> >> >
> >> > Yes, but how inefficient?  What's the size of all the cluster names?
> >> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size +
> 1]
> >> > 670
> >> >
> >> > So you'd add less than a kilobyte to the size of each serialization
> and
> >> > get
> >> > complete freedom from ids.  Something to think about.
> >> >>
> >> >> Virtualizing the id scheme is a good idea. Much more elegant and
> >> >> extensible. The current mechanism not only limits the number of
> >> >> possible
> >> >> clusters, but also "user defined" extensions can collide, for example
> >> >> if
> >> >> your Glue cluster id is the same of the Moose cluster id.
> >> >>
> >> >> I added an issue in our tracker.
> >> >>
> >> >> If it makes sense, maybe the weak array you suggest can be also used
> to
> >> >> avoid instantiating lots of FLObjectCluster like we are doing in
> >> >> Object:
> >> >>
> >> >> fuelCluster
> >> >>     ^ self class isVariable
> >> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
> >> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
> >> >>
> >> >> the second time you send fuelCluster to an object, it can reuse the
> >> >> cluster instance.
> >> >
> >> > Right.  I think that's important, and is one reason why I think
> passing
> >> > in
> >> > the analyser is important, because it allows certain objects to
> discover
> >> > existing clusters in the analyzer and join them if they want to,
> instead
> >> > of
> >> > having to invent and maintain their own cluster uniquing solution
> >> > .
> >> >>>
> >> >>> again thanks for a great framework.
> >> >>
> >> >> Thanks for your words and the feedback. Is Glue published somewhere?
> >> >
> >> > No, and its extremely proprietary :)  Newspeak however is available
> and
> >> > we
> >> > may end up maintaining a port of Fuel for Newspeak.
> >> > best regards,
> >> > Eliot
> >> >
> >> >>
> >> >> regards
> >> >> Martin
> >> >>
> >> >>
> >> >>>
> >> >>> best,
> >> >>> Eliot
> >> >>
> >> >>
> >> >>>
> >> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
> >> >>> <marianopeck at gmail.com> wrote:
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
> >> >>>> <eliot.miranda at gmail.com>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Hi Martin and Mariano,
> >> >>>>>     a couple of questions.  What's the right way to exclude
> certain
> >> >>>>> objects from the serialization?  Is there a way of excluding
> certain
> >> >>>>> inst
> >> >>>>> vars from certain objects?
> >> >>>>
> >> >>>>
> >> >>>> Eliot and the rest....Martin implemented this feature in
> >> >>>> Fuel-MartinDias.258. For the moment, we decided to put
> >> >>>> #fuelIgnoredInstanceVariableNames at class side.
> >> >>>>
> >> >>>> Behavior >> fuelIgnoredInstanceVariableNames
> >> >>>>     "Indicates which variables have to be ignored during
> >> >>>> serialization."
> >> >>>>
> >> >>>>     ^#()
> >> >>>>
> >> >>>>
> >> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
> >> >>>>   ^ #('instVar1')
> >> >>>>
> >> >>>>
> >> >>>> The impact in speed is nothing, so this is good. Now....we were
> >> >>>> thinking
> >> >>>> if it is common to need that 2 different instances of the same
> class
> >> >>>> need
> >> >>>> different instVars to ignore. Is this common ? do you usually need
> >> >>>> this ?
> >> >>>> We checked in SIXX and it is at instance side. Java uses the prefix
> >> >>>> 'transient' so it is at class side...
> >> >>>>
> >> >>>> thanks
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Mariano
> >> >>>> http://marianopeck.wordpress.com
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >
> >
> > --
> > best,
> > Eliot
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20110620/ad21f22a/attachment-0001.html>


More information about the Pharo-dev mailing list