[Pharo-project] Streams. Status and where to go?

Igor Stasenko siguctua at gmail.com
Sun Feb 28 05:15:03 EST 2010


On 28 February 2010 12:00, Nicolas Cellier
<nicolas.cellier.aka.nice at gmail.com> wrote:
> 2010/2/28 Igor Stasenko <siguctua at gmail.com>:
>> Hi, i'm also did some hacking. I uploaded XTream-Wrappers-sig.1 into SqS/XTream.
>>
>> There is a basic XtreamWrapper class, which should work transparently
>> for any stream (hopefully ;).
>> Next, in subclass i created converter. Sure thing i could also add a
>> buffered wrapper, but maybe later :)
>>
>> Here some benchmarks. The file i used to test is utf-8 russian doc
>> text - in attachment..
>>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>> {
>> [ str reset. (XtreamUTF8Converter on: str readXtream) upToEnd ] bench.
>> [ str reset. (UTF8Decoder new source: str readXtream) upToEnd ] bench.
>> }
>> #('21.71314741035857 per second.' '14.0371688414393 per second.')
>>  #('22.16896345116836 per second.' '14.5186953062848 per second.')
>>
>> Next, buffered
>>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>> {
>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
>> [ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
>> }
>> #('58.52976428286057 per second.' '25.44225800039754 per second.')
>> #('58.90575079872205 per second.' '25.87064676616916 per second.')
>>
>>
>> I'm also tried double-buffering, but neither my class nor yours
>> currently works with it:
>>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>> {
>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>> buffered upToEnd ] bench.
>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>> buffered upToEnd ] bench.
>> }
>>
>> Please , take a look. There are some quirks which not because i
>> cleaned up decoding/encoding code.
>> See XtreamWrapper>>upToEnd implementation.
>>
>>
>
> Yes I published a bit soon and messed up because one temp from text
> converter method (source) had same name as CharacterDecoder inst var
> :(
> Find a second attempt:
>
> | str |
> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
> {
> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
> buffered upToEnd ] bench.
> [ str reset. (UTF8Decoder new source: str readXtream buffered)
> buffered upToEnd ] bench.
> }
> #('118.0347513481126 per second.' '31.38117129722167 per second.')
>
>
> As you can see, the optimistic ASCII version is pessimistic in case of
> non ASCII...
> It creates a composite stream and perform a lot of copys...
> This is known and waiting better algorithm :)
>

whoops.. you got more than 3x speedup, while mine was around 2x.
But please, try on ascii files.

 | str |
 str := (String new: 1000 withAll: $a) asByteArray.
 {
 [ (XtreamUTF8Converter on: str readXtream binary)  upToEnd ] bench.
 [ (UTF8Decoder new source: str readXtream binary)  upToEnd ] bench.
 [ str readXtream binary upToEnd ] bench.
 }
 #('2039.392121575685 per second.' '1158.568286342731 per second.'
'92143.1713657269 per second.')

so, conversion is 90..45 times slower than just copying data :)
We need to tighten up this gap.
One would be to optimize #readInto:startingAt:count: using batch-mode
conversion.

> Nicolas
>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> Pharo-project at lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> Pharo-project at lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



-- 
Best regards,
Igor Stasenko AKA sig.




More information about the Pharo-dev mailing list