[Pharo-dev] Peek

Sven Van Caekenberghe sven at stfx.eu
Wed Nov 13 09:09:13 EST 2013


On 06 Nov 2013, at 13:56, Sven Van Caekenberghe <sven at stfx.eu> wrote:

> 
> On 06 Nov 2013, at 13:20, Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
> 
>> 
>> On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <sven at stfx.eu> wrote:
>> 
>>> 
>>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream did not yet make use of the optimisations that I did for ZnCharacterEncoding some time ago. More specifically, they were not yet using #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: which are overwritten for ZnUTF8Encoder with (super hacked) versions that assume most of the input will be ASCII (a reasonable assumption).
>>> 
>>> I am still chasing a bug, but right now:
>>> 
>>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream binary))
>>> 	next: 65536; close ] bench. 
>>> 
>>> 	"135 per second.” BEFORE
>>> 	"3,310 per second.” AFTER
>>> 
>>> But of course the input file is ASCII, so YMMV.
>>> 
>>> I’ll let you know when I commit this code.
>>> 
>>> Sven
>> 
>> Yeah… sooo, I loaded the updated version, great improvement for streams on Latin1 content :D
>> 
>> Maybe it’s just me, but I tested with actual wide source as well (it was as slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* which uses the normal Byte -> Wide become: conversion machinery.
>> Writing a ZnByteStringBecameWideString handler for every next: (and cousins) call where source may be non-latin1 is a real chore/nasty surprise for those used to dealing with legacy streams/converters... 
>> You can sort of kinda make up for the performance hit (at least on these sizes) with a faster replace:from:to:with:startingAt: in use after the fact, by using basicAt:put: to the string, thus avoiding converting the replacement value to character. 
>> 
>> Cheers,
>> Henry
>> 
>> PS: Why is ZnByteStringBecameWideString a notification and not a resumable exception? I would assume those who run into it without a handler would rather have an actual error, than a result where their string has been read up to the first wide char…
> 
> Henrik, thanks again for the feedback, it is really useful.
> 
> You are right, the ZnByteStringBecameWideString hack was introduced specifically to avoid the become, because benchmarking showed that it took too long. The original user is ZnStringEntity>>#readFrom: 
> 
> Now, if you know up front that your string will be wide anyway, then there is #decodeBytesIntoWideString:
> 
> Yes, ZnByteStringBecameWideString should probably be a resumable exception, as not handling it properly will lead to errors.
> 
> I will have to think and reconsider the generality/specificity of the optimisation hack in ZnUTF8Encoder. I got some ideas, but it will take some time.
> 
> At least the code works now, time to process the constructive criticism.

===
Name: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.28
Author: SvenVanCaekenberghe
Time: 13 November 2013, 3:04:00.925461 pm
UUID: f104e1ee-7fe5-4a23-a650-197dd9462783
Ancestors: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.27

ZnByteStringBecameWideString is now a resumable Error instead of a Notification (as suggested by henrik sperre johansen);
Added ZnByteStringBecameWideString>>#becomeForward convenience method
Fixed ZnCharacterReadStream>>#readInto:startingAt:count: to do the actual becomeForward when needed so that clients are not affected
===
Name: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.15
Author: SvenVanCaekenberghe
Time: 13 November 2013, 3:05:18.26391 pm
UUID: 8c41ee29-9906-4626-a9da-cb576980cd4c
Ancestors: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.14

Added a unit test to make sure ZnCharacterReadStream>>#readInto:startingAt:count: does an actual becomeForward when needed
===

At least the situation is correct now.

Sven





More information about the Pharo-dev mailing list