[Pharo-project] Streams. Status and where to go?

Igor Stasenko siguctua at gmail.com
Fri Feb 26 13:33:59 EST 2010


On 26 February 2010 18:59, Nicolas Cellier
<nicolas.cellier.aka.nice at gmail.com> wrote:
> 2010/2/26 Igor Stasenko <siguctua at gmail.com>:
>> Hello, Nicolas.
>
> Hi igor.
> You should load it in trunk.
>
Ah, i think my image is a bit outdated then.

>> I want to try it out.
>> I tried to load it (XTream-Core) into my image, and it bug me about
>> unresolved dependencies:
>> ----
>> This package depends on the following classes:
>>  ByteTextConverter
>> You must resolve these dependencies before you will be able to load
>> these definitions:
>>  ByteTextConverter>>nextFromXtream:
>>  ByteTextConverter>>nextPut:toXtream:
>>  ByteTextConverter>>readInto:startingAt:count:fromXtream:
>> ----
>> I ignored these warnings, pressing continue, and here what it warns
>> about in my trunk image:
>>
>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Map is Undeclared)
>> TextConverter>>next:putAll:startingAt:toXtream: (latin1Encodings is Undeclared)
>> TextConverter>>readInto:startingAt:count:fromXtream: (latin1Map is Undeclared)
>>
>> Is ByteTextConverter a Pharo-specific class?
>>
>
> This is a refactoring of TextConverter I made in trunk.
> Pharo did the same before me (it comes from Sophie), but I missed it
> unfortunately...
>
>> If you seen my previous message, i think you noticed that
>> XXXTextConverter is abdominations (IMO), and should be reimplemented
>> as a wrapping-streams instead.
>> Would you be willing to change that in XStreams? I mean implementing a
>> conversion streams model, which can wrap around any other stream,
>> like:
>>
>> myStream := UTFReaderStream on: otherStream.
>> myString := myStream contents.
>>
>> or using other way:
>>
>> myString := (someBaseStream wrapWith: UTFReaderStream) contents.
>>
>> or..
>> myDecodedString := (someBaseStream wrapWith: (DecodingStreams
>> decoderFor: myEncoding) contents.
>>
>> That's would be much nicer than using converters.
>
> Currently, I have a ConverterReadXtream and a ConverterWriteXtream
> which are stream wrappers.
> They use old TextConverter to do the real job, but I agree, a full
> rewrite of this one is needed.
> However, I would like to keep these two layers for Stream composition:
> - the generic converter stream
> - the conversion algorithm
>

Why?
In your implementation you already added the
readInto: aCollection startingAt: startIndex count: anInteger
and
next: count into: aString startingAt: startIndex
into converters, which makes them even more like streams.

So, what stopping you from making an abstract, generic XtreamWrapper class,
and then a number of subclasses  (LatinConversionStream ,
UnicodeConversionStream etc),
as well as BufferedWrapper?

So, it will cost us 1 less message dispatch in
a := stream next.

In your model you having:

(converter stream) -> (converter) -> basic stream

while if using wrapper it will be just:
(converter wrapper) -> basic stream


> Though current XTream is a quick hack reusing Yoshiki TextConverter,
> it already demonstrates possible gains coming from buffering.
> The speed comes from,applying utf8ToSqueak, squeakToUtf8 trick: copy
> large ASCII encoded portions verbatim.
> This works very well with squeak source because 99,99% of characters are ASCII.
>
>>
>> Wrappers is more flexible comparing to TextConverters, since they are
>> not obliged to convert to/from text-based collections only.
>> For example, we can use same API for wrapping with ZIP stream:
>>
>> myUnpackedData := (someBaseStream wrapWith: ZIPReaderStream) contents.
>>
>> and many other (ab)uses.. Like reading changeset chunks:
>>
>> nextChunk := (fileStream wrapWith: ChunkReaderStream) next.
>>
>
> Yes, that fits my intentions.
> What I want is to preserve buffered operations along the chain, and
> avoid byte-by-byte conversions when possible.
>

Buffering is just a wrapper. Btw, again, why you don't providing a
generic wrapper class which everyone can subclass from?

bufferedStream := anyStreamClass buffered

(buffered wrapper) -> (anyStreamClass)

i don't see where else you should care of buffering explicitly in
anyStreamClass.

And, how you can avoid byte-by-byte conversion in utf8? It should
iterate over bytes to determine the characters anyways.
But sure thing, nothing prevents you from buffering things in a way like:

reader := anyStream buffered wrapWith: UTF8Reader.

>>
>> On 25 February 2010 21:19, Nicolas Cellier
>> <nicolas.cellier.aka.nice at gmail.com> wrote:
>>> 2010/2/25 Igor Stasenko <siguctua at gmail.com>:
>>>> Hello,
>>>>
>>>> i am cross-posting, since i think it is good for all of us to agree on
>>>> some common points.
>>>>
>>>> 1. Streams needs to be rewritten.
>>>> 2. What do you think is good replacement for current Streams?
>>>>
>>>> personally, i currently need a fast and concise UTF8 reader.
>>>> The UTF8TextConverter is closest thing what i would take, but i don't
>>>> understand, why
>>>> it implemented as a non-stream?
>>>>
>>>> The #nextFromStream:
>>>> and #nextPut:toStream:
>>>> crying out of loud to be just
>>>> #next
>>>> and
>>>> #nextPut:
>>>>
>>>> Another thing which makes me sad is this line:
>>>>
>>>> nextFromStream: aStream
>>>>
>>>>        | character1 value1 character2 value2 unicode character3 value3
>>>> character4 value4 |
>>>>        aStream isBinary ifTrue: [^ aStream basicNext].   <<<<<<<
>>>>
>>>>
>>>> All external streams is initially binary , but UTF8TextConverter wants
>>>> to play with characters, instead of octets..
>>>> But hey... UTF8 encoding is exactly about encoding unicode characters
>>>> into binary form..
>>>> I'm not even mentioning that operating with bytes (smallints) is times
>>>> more efficient than operating with characters (objects), because first
>>>> thing it does:
>>>>
>>>>        character1 := aStream basicNext.  " a #basicNext, obviously, reads a
>>>> byte from somewhere and then converts it to instance of Character.
>>>> 'Bonus' overhead here. "
>>>>        character1 isNil ifTrue: [^ nil].
>>>>        value1 := character1 asciiValue.  " and... what a surprise, we
>>>> converting a character back to integer value.. What a waste! "
>>>>        value1 <= 127 ifTrue: [
>>>>
>>>> I really hope, that eventually we could have a good implementation,
>>>> where horse runs ahead of cart, not cart ahead of horse :)
>>>> Meanwhile i think i have no choice but make yet-another implementation
>>>> of utf8 reader in my own package, instead of using existing one.
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>
>>> Obviously right. encoded in bytes, decoded in Characters.
>>>
>>> There are also ideas experimented at http://www.squeaksource.com/XTream.html
>>> Sorry I hijacked VW name...
>>> You can download it, it coexist pacificly with Stream.
>>>
>>> - use endOfStreamAction instead of Exception... That means abandonning
>>> primitives next nextPut: (no real performance impact, and expect a
>>> boost in future COG).
>>> - separate CollectionReadStream=concrete class, ReadStream=abstract class
>>> - use a wrapper rather than a subclass for MultiByteFileStream
>>> - implement sequenceable collection API
>>> - buffer I/O (mostly in Squeak thanks Levente)
>>>
>>> Of course, alternate ideas to pick from Nile, VW XTream, gst generators etc...
>>>
>>> I think mutating existing library is doable (just a bit tricky because
>>> both Compiler and source code management use Stream extensively...).
>>>
>>> Nicolas
>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> Pharo-project at lists.gforge.inria.fr
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> Pharo-project at lists.gforge.inria.fr
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> Pharo-project at lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> XTream-Tests gives some usage pattern.
> Here are also some timings on various machines just to check efficiency:
> Though XTream does not use any next/nextPut: primitive, it competes quite well.
>
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [str readStream upToEnd] bench.
> [str readXtream upToEnd] bench.
> }
> #('583247.75044991 per second.' '597688.862227554 per second.')
> #('221266.5466906619 per second.' '221899.4201159768 per second.')
> #('218044.1911617676 per second.' '220044.1911617676 per second.')
> #('190631.7473010796 per second.' '192736.452709458 per second.')
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [str readStream upTo: $b] bench.
> [str readXtream upTo: $b] bench.
> }
> #('125180.9638072386 per second.' '126922.0155968806 per second.')
> #('120683.8632273545 per second.' '123071.1857628474 per second.')
> #('105943.4113177364 per second.' '107742.851429714 per second.')
>
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [str readStream upToAnyOf: (CharacterSet crlf)] bench.
> [str readXtream upToAnyOf: (CharacterSet crlf)] bench.
> }
> #('112977.2045590882 per second.' '112393.3213357328 per second.')
> #('108469.9060187962 per second.' '108042.9914017197 per second.')
> #('91692.0615876825 per second.' '92319.1361727654 per second.')
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [| tmp | tmp := str readStream. [tmp next==nil] whileFalse] bench.
> [| tmp | tmp := str readXtream. [tmp next==nil] whileFalse] bench.
> [| tmp | tmp := str readXtream. tmp do: [:e | ]] bench.
> }
> #('10452.10957808438 per second.' '6419.11617676465 per second.'
> '2384.323135372925 per second.')
> #('9799.2401519696 per second.' '6436.712657468506 per second.'
> '2171.765646870626 per second.')
> #('10475.7048590282 per second.' '4569.08618276345 per second.'
> '1989.202159568086 per second.')
>
> | str |
> str := String new: 80000 withAll: $a.
> {
> [| tmp | tmp := str readStream. [tmp next==nil] whileFalse] bench.
> [| tmp | tmp := str readXtream. [tmp next==nil] whileFalse] bench.
> [| tmp | tmp := str readXtream. tmp do: [:e | ]] bench.
> }
> #('131.1737652469506 per second.' '81.1026767878546 per second.'
> '29.96404314822213 per second.')
> #('132.388178913738 per second.' '81.701957650819 per second.'
> '27.44084310996222 per second.')
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [str readStream upToAll: 'ab'] bench.
> [str readXtream upToAll: 'ab'] bench.
> }
> #('514.297140571886 per second.' '633.473305338932 per second.')
> #('511.795281887245 per second.' '561.487702459508 per second.')
> #('513.497300539892 per second.' '557.48850229954 per second.')
>
> | str |
> str := String new: 1000 withAll: $a.
> {
> [str readStream upToAll: 'aab'] bench.
> [str readXtream upToAll: 'aab'] bench.
> }
> #('892.021595680864 per second.' '1427.914417116577 per second.')
> #('388.122375524895 per second.' '521.991203518593 per second.')
> #('394.5632620427743 per second.' '539.892021595681 per second.')
> #('384.6461415433827 per second.' '476.2095161935226 per second.')
> #('382.846861255498 per second.' '475.9048190361927 per second.')
>
> {
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name). [tmp next==nil] whileFalse. tmp close] timeToRun.
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) readXtream buffered. [tmp next==nil] whileFalse. tmp
> close] timeToRun.
> }
> #(1639 1491)
> #(3121 2892)
> #(3213 2799)
> #(2591 2115)
> #(2146 2030) #(2153 1988) #(2770 2574) #(2319 2089) #(2141 1927) #(27008 1947)
>
> {
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii.
>        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) readXtream ascii buffered.
>        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
> }
> #(8779 566)
> #(6418 1182)
> #(6084 1076)
> #(4647 856)
> #(4742 881) #(4332 818) #(4859 855) #(4503 1563) #(4347 816) #(4026
> 835) #(4285 821)
>
> {
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii.
>        [tmp nextLine == nil] whileFalse. tmp close] timeToRun.
> MessageTally spyOn: [[| tmp | tmp := (StandardFileStream
> readOnlyFileNamed: (SourceFiles at: 2) name) readXtream ascii
> buffered.
>        [tmp nextLine == nil] whileFalse. tmp close] timeToRun].
> }
> #(2088 1996) #(1920 1814) #(1589 1537) #(1631 1514) #(1587 1449)
> #(1490 1434) #(1567 1667) #(1807 1777) #(1785 2159) #(1802 2147)
>
> MessageTally spyOn: [| tmp | tmp := (StandardFileStream
> readOnlyFileNamed: (SourceFiles at: 2) name) readXtream ascii
> buffered.
>        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close]
> .
> {
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii.
>        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse. tmp
> close] timeToRun.
> [| tmp | tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) readXtream ascii buffered.
>        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse. tmp
> close] timeToRun.
> }
> #(9153 665)
> #(6463 1251)
> #(5028 996) #(5076 1051) #(5223 949) #(4898 1073) #(5130 1610) #(5092
> 1776) #(4798 878) #(4757 956) #(5499 1405) #(14522 954) #(75895 1003)
>
>
> {
> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii; wantsLineEndConversion: false; converter:
> UTF8TextConverter new.
>        1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> [| tmp atEnd | tmp := (StandardFileStream readOnlyFileNamed:
> (SourceFiles at: 2) name) readXtream ascii buffered decodeWith:
> UTF8TextConverter new.
>        1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> }
> #(332 183)
> #(558 422) #(678 421) #(686 420) #(675 423) #(673 423) #(662 410)
> #(681 558) #(674 550) #(674 928) #(694 1043) #(1668 1112)
>
>
> {
> MessageTally spyOn: [[| tmp | tmp := (MultiByteFileStream
> readOnlyFileNamed: (SourceFiles at: 2) name) ascii;
> wantsLineEndConversion: false; converter: UTF8TextConverter new.
>        1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun].
> MessageTally spyOn: [[| tmp | tmp := (StandardFileStream
> readOnlyFileNamed: (SourceFiles at: 2) name) readXtream ascii buffered
> decodeWith: UTF8TextConverter new.
>        1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun].
> }
> #(349 189)
> #(577 458) #(595 487)
> #(574 438) #(699 444) #(714 457) #(722 449) #(724 438) #(692 572)
> #(707 698) #(707 693) #(689 670) #(691 663) #(726 957) #(714 1105)
> #(724 1150) #(1765 1098)
>
> {
> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii; wantsLineEndConversion: false; converter:
> UTF8TextConverter new.
>      1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter
> new installLineEndConvention: nil)) buffered.
>      1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> }
> #(318 14)
> #(558 38) #(559 44) #(579 43)#(540 32)
> #(701 34) #(694 36)
>
>
> MessageTally spyOn: [
> | string1 converter |
> string1 := 'à ta santé mon brave' squeakToUtf8.
> converter := UTF8TextConverter new installLineEndConvention: nil.
> {
>        [string1 utf8ToSqueak] bench.
>        [(string1 readXtream decodeWith: converter) upToEnd] bench.
>        [(string1 readXtream decodeWith: converter) buffered upToEnd] bench.
> }
> ]
> #('99488.1023795241 per second.' '27299.1401719656 per second.'
> '17217.55648870226 per second.')
> #('106710.2579484103 per second.' '30986.6026794641 per second.'
> '21273.1453709258 per second.')
> #('108047.7904419116 per second.' '31168.56628674265 per second.'
> '21107.17856428714 per second.')
> #('96647.2705458908 per second.' '28705.25894821036 per second.'
> '19899.4201159768 per second.')
> #('95075.9848030394 per second.' '32338.5322935413 per second.'
> '20242.95140971806 per second.')
>
> MessageTally spyOn: [
> | string1 converter |
> string1 := 'This ASCII string should not be hard to decode' squeakToUtf8.
> converter := UTF8TextConverter new installLineEndConvention: nil.
> {
>        [string1 utf8ToSqueak] bench.
>        [(string1 readXtream decodeWith: converter) upToEnd] bench.
>        [(string1 readXtream decodeWith: converter) buffered upToEnd] bench.
> }
> ]
> #('810708.458308338 per second.' '15476.30473905219 per second.'
> '24907.81843631274 per second.')
> #('1.044100979804039e6 per second.' '18131.57368526295 per second.'
> '40563.0873825235 per second.')
>
>
> {
> [|ws |
>       ws := (String new: 10000) writeStream.
>       1 to: 20000 do: [:i | ws nextPut: $0]] bench.
> [| ws |
>       ws := (String new: 10000) writeXtream.
>       1 to: 20000 do: [:i | ws nextPut: $0]] bench.
> }
> #('442.7114577084583 per second.' '359.3281343731254 per second.')
> #('178.4929042574455 per second.' '130.7738452309538 per second.')
> #('182.490505696582 per second.' '131.1475409836065 per second.')
> #('85.4291417165669 per second.' '128.8453855373552 per second.')
> #('86.4789294987018 per second.' '128.374325134973 per second.')
>
> _______________________________________________
> Pharo-project mailing list
> Pharo-project at lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



-- 
Best regards,
Igor Stasenko AKA sig.




More information about the Pharo-dev mailing list