[Pharo-project] Streams. Status and where to go?

Igor Stasenko siguctua at gmail.com
Fri Feb 26 17:01:40 EST 2010


On 26 February 2010 21:30, Nicolas Cellier
<nicolas.cellier.aka.nice at gmail.com> wrote:
> 2010/2/26 Igor Stasenko <siguctua at gmail.com>:
>> On 26 February 2010 18:59, Nicolas Cellier
>> <nicolas.cellier.aka.nice at gmail.com> wrote:
>>> 2010/2/26 Igor Stasenko <siguctua at gmail.com>:
>>>> Hello, Nicolas.
>>>
>>> Hi igor.
>>> You should load it in trunk.
>>>
>> Ah, i think my image is a bit outdated then.
>>
>>>> I want to try it out.
>>>> I tried to load it (XTream-Core) into my image, and it bug me about
>>>> unresolved dependencies:
>>>> ----
>>>> This package depends on the following classes:
>>>>  ByteTextConverter
>>>> You must resolve these dependencies before you will be able to load
>>>> these definitions:
>>>>  ByteTextConverter>>nextFromXtream:
>>>>  ByteTextConverter>>nextPut:toXtream:
>>>>  ByteTextConverter>>readInto:startingAt:count:fromXtream:
>>>> ----
>>>> I ignored these warnings, pressing continue, and here what it warns
>>>> about in my trunk image:
>>>>
>>>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Map is Undeclared)
>>>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Encodings is Undeclared)
>>>> TextConverter>>readInto:startingAt:count:fromXtream: (latin1Map is Undeclared)
>>>>
>>>> Is ByteTextConverter a Pharo-specific class?
>>>>
>>>
>>> This is a refactoring of TextConverter I made in trunk.
>>> Pharo did the same before me (it comes from Sophie), but I missed it
>>> unfortunately...
>>>
>>>> If you seen my previous message, i think you noticed that
>>>> XXXTextConverter is abdominations (IMO), and should be reimplemented
>>>> as a wrapping-streams instead.
>>>> Would you be willing to change that in XStreams? I mean implementing a
>>>> conversion streams model, which can wrap around any other stream,
>>>> like:
>>>>
>>>> myStream := UTFReaderStream on: otherStream.
>>>> myString := myStream contents.
>>>>
>>>> or using other way:
>>>>
>>>> myString := (someBaseStream wrapWith: UTFReaderStream) contents.
>>>>
>>>> or..
>>>> myDecodedString := (someBaseStream wrapWith: (DecodingStreams
>>>> decoderFor: myEncoding) contents.
>>>>
>>>> That's would be much nicer than using converters.
>>>
>>> Currently, I have a ConverterReadXtream and a ConverterWriteXtream
>>> which are stream wrappers.
>>> They use old TextConverter to do the real job, but I agree, a full
>>> rewrite of this one is needed.
>>> However, I would like to keep these two layers for Stream composition:
>>> - the generic converter stream
>>> - the conversion algorithm
>>>
>>
>> Why?
>> In your implementation you already added the
>> readInto: aCollection startingAt: startIndex count: anInteger
>> and
>> next: count into: aString startingAt: startIndex
>> into converters, which makes them even more like streams.
>>
>
> Yes, you may be right.
> Maybe my ratio innovating/reusing was a bit low :)
>
>> So, what stopping you from making an abstract, generic XtreamWrapper class,
>> and then a number of subclasses  (LatinConversionStream ,
>> UnicodeConversionStream etc),
>
> Yes, that's possible. But it's already what ConverterReadXtream and
> ConverterWriteXtream are.
>

I suggesting to use a following hierarchy

Xtream -> XtreamWrapper -> ConverterXtream -> (bunch of subclasses)

or just

Xtream -> XtreamWrapper -> (bunch of subclasses)

since i don't think there a lot of specific behavior in
ConverterXtream worth creating a separate class.
But maybe i'm wrong.

>> as well as BufferedWrapper?
>>
>
> BufferedReadXtream and BufferedWriteXtream already are generic. It's
> just that I have separated read and write...
> So you will find ReadXtream>>buffered
>        ^(BufferedReadXtream new)
>                contentsSpecies: self contentsSpecies bufferSize: self preferredBufferSize;
>                source: self
>
> Or do you mean a single Buffer for read/write ?
> That would look more like original VW I think.
>

obviously, if you reading and writing to same stream , you should take
care of keeping any buffered i/o in sync.
The #buffered can decide what kind of stream to create
self isWritable ifTrue: [ create R/W wrapper ] ifFalse: [ create R/O wrapper ]

this is , of course if you promote #buffered to Xtream class. Which i
think worthful thing.

>> So, it will cost us 1 less message dispatch in
>> a := stream next.
>>
>> In your model you having:
>>
>> (converter stream) -> (converter) -> basic stream
>>
>> while if using wrapper it will be just:
>> (converter wrapper) -> basic stream
>>
>
> I must re-think why I made this decision of additional indirection...
> Maybe it was just reusing...
>

I think this is just about reuse. But as i shown in
UFT8TextConverter>>nextFromStream:
its in addition to extra dispatch, using a characters instead of
bytes, which can be avoided
if you wrap the stream to be converted and tell it to work in binary
mode, since your wrapper are in control.

>>
>>> Though current XTream is a quick hack reusing Yoshiki TextConverter,
>>> it already demonstrates possible gains coming from buffering.
>>> The speed comes from,applying utf8ToSqueak, squeakToUtf8 trick: copy
>>> large ASCII encoded portions verbatim.
>>> This works very well with squeak source because 99,99% of characters are ASCII.
>>>
>>>>
>>>> Wrappers is more flexible comparing to TextConverters, since they are
>>>> not obliged to convert to/from text-based collections only.
>>>> For example, we can use same API for wrapping with ZIP stream:
>>>>
>>>> myUnpackedData := (someBaseStream wrapWith: ZIPReaderStream) contents.
>>>>
>>>> and many other (ab)uses.. Like reading changeset chunks:
>>>>
>>>> nextChunk := (fileStream wrapWith: ChunkReaderStream) next.
>>>>
>>>
>>> Yes, that fits my intentions.
>>> What I want is to preserve buffered operations along the chain, and
>>> avoid byte-by-byte conversions when possible.
>>>
>>
>> Buffering is just a wrapper. Btw, again, why you don't providing a
>> generic wrapper class which everyone can subclass from?
>>
>> bufferedStream := anyStreamClass buffered
>>
>> (buffered wrapper) -> (anyStreamClass)
>>
>
> See above, it's just split in BufferedRead/WriteXtream
>
> Or see the example (a bit heavy)
>  tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name)
>      readXtream ascii buffered decodeWith: (UTF8TextConverter new
> installLineEndConvention: nil)) buffered.
>

yes. its a bit heavy, but this is a way how one should build a chains
of streams.
Except that there should be only streams in chain, no non-stream
converters in between :)

>
>> i don't see where else you should care of buffering explicitly in
>> anyStreamClass.
>>
>> And, how you can avoid byte-by-byte conversion in utf8? It should
>> iterate over bytes to determine the characters anyways.
>
> True, it is faster because you scan fast with a primitive,
> then copy a whole chunk with replaceFrom:to:with:startingAt: primitive
>
> Of course, if you handle some cyrillic files, then this strategy won't
> be efficient. It just work in ASCII dominated files.
> UTF8 itself would not be an optimal choice for cyrillic anyway...
>
I prefer to use UFT8 nowadays, instead of old rubbish encodings, which
is many :)

>> But sure thing, nothing prevents you from buffering things in a way like:
>>
>> reader := anyStream buffered wrapWith: UTF8Reader.
>>
>
> My above example is just equivalent to:
>
> reader := (anyStream buffered wrapWith: UTF8Reader) buffered.
>
> Then even if I use reader next, a whole buffer of UTF8 is converted
> (presumably by large chunks)
>

Right, nobody says that its not possible to do double-buffering.
First, by wrapping an original stream (presumably file-based)
and second - an output of utf8 converter.

[snip]

-- 
Best regards,
Igor Stasenko AKA sig.




More information about the Pharo-dev mailing list