[Pharo-project] Streams. Status and where to go?

Stéphane Ducasse stephane.ducasse at inria.fr
Sun Feb 28 04:51:27 EST 2010


If you feel that this is important we can take some time and port the code from VW.
Let us know I can ask cyrille to give a try.

Stef

On Feb 27, 2010, at 10:56 PM, Nicolas Cellier wrote:

> 2010/2/27 Levente Uzonyi <leves at elte.hu>:
>> On Sat, 27 Feb 2010, Richard Durr wrote:
>> 
>>> So what speaks against using VisualWorks' Xtreams?
>>> 
>>> http://www.cincomsmalltalk.com/blog/blogView?entry=3444278480&printTitle=Smalltalk_Daily_02/22/10:_Introducing_Xtreams&showComments=true
>> 
>> 1. Someone has to port it.
>> 2. It's optimized for VW, so the ported code's performance will probably be
>> bad.
>> 
>> 
>> Levente
>> 
> 
> Licensing was not clear when I begun, so I just picked a few ideas and
> re-implement from scratch.
> Now it would be interesting to try porting VW Xtream (I should say the
> original XTream, I just hijacked the name...).
> Concerning performance, VW XTream use exeptions extensively, which I
> tried to avoid.
> 
> Nicolas
> 
>>> 
>>> On Fri, Feb 26, 2010 at 11:01 PM, Igor Stasenko <siguctua at gmail.com>
>>> wrote:
>>>> 
>>>> On 26 February 2010 21:30, Nicolas Cellier
>>>> <nicolas.cellier.aka.nice at gmail.com> wrote:
>>>>> 
>>>>> 2010/2/26 Igor Stasenko <siguctua at gmail.com>:
>>>>>> 
>>>>>> On 26 February 2010 18:59, Nicolas Cellier
>>>>>> <nicolas.cellier.aka.nice at gmail.com> wrote:
>>>>>>> 
>>>>>>> 2010/2/26 Igor Stasenko <siguctua at gmail.com>:
>>>>>>>> 
>>>>>>>> Hello, Nicolas.
>>>>>>> 
>>>>>>> Hi igor.
>>>>>>> You should load it in trunk.
>>>>>>> 
>>>>>> Ah, i think my image is a bit outdated then.
>>>>>> 
>>>>>>>> I want to try it out.
>>>>>>>> I tried to load it (XTream-Core) into my image, and it bug me about
>>>>>>>> unresolved dependencies:
>>>>>>>> ----
>>>>>>>> This package depends on the following classes:
>>>>>>>>  ByteTextConverter
>>>>>>>> You must resolve these dependencies before you will be able to load
>>>>>>>> these definitions:
>>>>>>>>  ByteTextConverter>>nextFromXtream:
>>>>>>>>  ByteTextConverter>>nextPut:toXtream:
>>>>>>>>  ByteTextConverter>>readInto:startingAt:count:fromXtream:
>>>>>>>> ----
>>>>>>>> I ignored these warnings, pressing continue, and here what it warns
>>>>>>>> about in my trunk image:
>>>>>>>> 
>>>>>>>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Map is
>>>>>>>> Undeclared)
>>>>>>>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Encodings is
>>>>>>>> Undeclared)
>>>>>>>> TextConverter>>readInto:startingAt:count:fromXtream: (latin1Map is
>>>>>>>> Undeclared)
>>>>>>>> 
>>>>>>>> Is ByteTextConverter a Pharo-specific class?
>>>>>>>> 
>>>>>>> 
>>>>>>> This is a refactoring of TextConverter I made in trunk.
>>>>>>> Pharo did the same before me (it comes from Sophie), but I missed it
>>>>>>> unfortunately...
>>>>>>> 
>>>>>>>> If you seen my previous message, i think you noticed that
>>>>>>>> XXXTextConverter is abdominations (IMO), and should be reimplemented
>>>>>>>> as a wrapping-streams instead.
>>>>>>>> Would you be willing to change that in XStreams? I mean implementing
>>>>>>>> a
>>>>>>>> conversion streams model, which can wrap around any other stream,
>>>>>>>> like:
>>>>>>>> 
>>>>>>>> myStream := UTFReaderStream on: otherStream.
>>>>>>>> myString := myStream contents.
>>>>>>>> 
>>>>>>>> or using other way:
>>>>>>>> 
>>>>>>>> myString := (someBaseStream wrapWith: UTFReaderStream) contents.
>>>>>>>> 
>>>>>>>> or..
>>>>>>>> myDecodedString := (someBaseStream wrapWith: (DecodingStreams
>>>>>>>> decoderFor: myEncoding) contents.
>>>>>>>> 
>>>>>>>> That's would be much nicer than using converters.
>>>>>>> 
>>>>>>> Currently, I have a ConverterReadXtream and a ConverterWriteXtream
>>>>>>> which are stream wrappers.
>>>>>>> They use old TextConverter to do the real job, but I agree, a full
>>>>>>> rewrite of this one is needed.
>>>>>>> However, I would like to keep these two layers for Stream composition:
>>>>>>> - the generic converter stream
>>>>>>> - the conversion algorithm
>>>>>>> 
>>>>>> 
>>>>>> Why?
>>>>>> In your implementation you already added the
>>>>>> readInto: aCollection startingAt: startIndex count: anInteger
>>>>>> and
>>>>>> next: count into: aString startingAt: startIndex
>>>>>> into converters, which makes them even more like streams.
>>>>>> 
>>>>> 
>>>>> Yes, you may be right.
>>>>> Maybe my ratio innovating/reusing was a bit low :)
>>>>> 
>>>>>> So, what stopping you from making an abstract, generic XtreamWrapper
>>>>>> class,
>>>>>> and then a number of subclasses  (LatinConversionStream ,
>>>>>> UnicodeConversionStream etc),
>>>>> 
>>>>> Yes, that's possible. But it's already what ConverterReadXtream and
>>>>> ConverterWriteXtream are.
>>>>> 
>>>> 
>>>> I suggesting to use a following hierarchy
>>>> 
>>>> Xtream -> XtreamWrapper -> ConverterXtream -> (bunch of subclasses)
>>>> 
>>>> or just
>>>> 
>>>> Xtream -> XtreamWrapper -> (bunch of subclasses)
>>>> 
>>>> since i don't think there a lot of specific behavior in
>>>> ConverterXtream worth creating a separate class.
>>>> But maybe i'm wrong.
>>>> 
>>>>>> as well as BufferedWrapper?
>>>>>> 
>>>>> 
>>>>> BufferedReadXtream and BufferedWriteXtream already are generic. It's
>>>>> just that I have separated read and write...
>>>>> So you will find ReadXtream>>buffered
>>>>>        ^(BufferedReadXtream new)
>>>>>                contentsSpecies: self contentsSpecies bufferSize: self
>>>>> preferredBufferSize;
>>>>>                source: self
>>>>> 
>>>>> Or do you mean a single Buffer for read/write ?
>>>>> That would look more like original VW I think.
>>>>> 
>>>> 
>>>> obviously, if you reading and writing to same stream , you should take
>>>> care of keeping any buffered i/o in sync.
>>>> The #buffered can decide what kind of stream to create
>>>> self isWritable ifTrue: [ create R/W wrapper ] ifFalse: [ create R/O
>>>> wrapper ]
>>>> 
>>>> this is , of course if you promote #buffered to Xtream class. Which i
>>>> think worthful thing.
>>>> 
>>>>>> So, it will cost us 1 less message dispatch in
>>>>>> a := stream next.
>>>>>> 
>>>>>> In your model you having:
>>>>>> 
>>>>>> (converter stream) -> (converter) -> basic stream
>>>>>> 
>>>>>> while if using wrapper it will be just:
>>>>>> (converter wrapper) -> basic stream
>>>>>> 
>>>>> 
>>>>> I must re-think why I made this decision of additional indirection...
>>>>> Maybe it was just reusing...
>>>>> 
>>>> 
>>>> I think this is just about reuse. But as i shown in
>>>> UFT8TextConverter>>nextFromStream:
>>>> its in addition to extra dispatch, using a characters instead of
>>>> bytes, which can be avoided
>>>> if you wrap the stream to be converted and tell it to work in binary
>>>> mode, since your wrapper are in control.
>>>> 
>>>>>> 
>>>>>>> Though current XTream is a quick hack reusing Yoshiki TextConverter,
>>>>>>> it already demonstrates possible gains coming from buffering.
>>>>>>> The speed comes from,applying utf8ToSqueak, squeakToUtf8 trick: copy
>>>>>>> large ASCII encoded portions verbatim.
>>>>>>> This works very well with squeak source because 99,99% of characters
>>>>>>> are ASCII.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Wrappers is more flexible comparing to TextConverters, since they are
>>>>>>>> not obliged to convert to/from text-based collections only.
>>>>>>>> For example, we can use same API for wrapping with ZIP stream:
>>>>>>>> 
>>>>>>>> myUnpackedData := (someBaseStream wrapWith: ZIPReaderStream)
>>>>>>>> contents.
>>>>>>>> 
>>>>>>>> and many other (ab)uses.. Like reading changeset chunks:
>>>>>>>> 
>>>>>>>> nextChunk := (fileStream wrapWith: ChunkReaderStream) next.
>>>>>>>> 
>>>>>>> 
>>>>>>> Yes, that fits my intentions.
>>>>>>> What I want is to preserve buffered operations along the chain, and
>>>>>>> avoid byte-by-byte conversions when possible.
>>>>>>> 
>>>>>> 
>>>>>> Buffering is just a wrapper. Btw, again, why you don't providing a
>>>>>> generic wrapper class which everyone can subclass from?
>>>>>> 
>>>>>> bufferedStream := anyStreamClass buffered
>>>>>> 
>>>>>> (buffered wrapper) -> (anyStreamClass)
>>>>>> 
>>>>> 
>>>>> See above, it's just split in BufferedRead/WriteXtream
>>>>> 
>>>>> Or see the example (a bit heavy)
>>>>>  tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2)
>>>>> name)
>>>>>      readXtream ascii buffered decodeWith: (UTF8TextConverter new
>>>>> installLineEndConvention: nil)) buffered.
>>>>> 
>>>> 
>>>> yes. its a bit heavy, but this is a way how one should build a chains
>>>> of streams.
>>>> Except that there should be only streams in chain, no non-stream
>>>> converters in between :)
>>>> 
>>>>> 
>>>>>> i don't see where else you should care of buffering explicitly in
>>>>>> anyStreamClass.
>>>>>> 
>>>>>> And, how you can avoid byte-by-byte conversion in utf8? It should
>>>>>> iterate over bytes to determine the characters anyways.
>>>>> 
>>>>> True, it is faster because you scan fast with a primitive,
>>>>> then copy a whole chunk with replaceFrom:to:with:startingAt: primitive
>>>>> 
>>>>> Of course, if you handle some cyrillic files, then this strategy won't
>>>>> be efficient. It just work in ASCII dominated files.
>>>>> UTF8 itself would not be an optimal choice for cyrillic anyway...
>>>>> 
>>>> I prefer to use UFT8 nowadays, instead of old rubbish encodings, which
>>>> is many :)
>>>> 
>>>>>> But sure thing, nothing prevents you from buffering things in a way
>>>>>> like:
>>>>>> 
>>>>>> reader := anyStream buffered wrapWith: UTF8Reader.
>>>>>> 
>>>>> 
>>>>> My above example is just equivalent to:
>>>>> 
>>>>> reader := (anyStream buffered wrapWith: UTF8Reader) buffered.
>>>>> 
>>>>> Then even if I use reader next, a whole buffer of UTF8 is converted
>>>>> (presumably by large chunks)
>>>>> 
>>>> 
>>>> Right, nobody says that its not possible to do double-buffering.
>>>> First, by wrapping an original stream (presumably file-based)
>>>> and second - an output of utf8 converter.
>>>> 
>>>> [snip]
>>>> 
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>> 
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> Pharo-project at lists.gforge.inria.fr
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>> 
>>> _______________________________________________
>>> Pharo-project mailing list
>>> Pharo-project at lists.gforge.inria.fr
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> 
>> _______________________________________________
>> Pharo-project mailing list
>> Pharo-project at lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> 
> 
> _______________________________________________
> Pharo-project mailing list
> Pharo-project at lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project





More information about the Pharo-dev mailing list