Sven Van Caekenberghe
sven at stfx.eu
Wed Nov 13 09:09:13 EST 2013
On 06 Nov 2013, at 13:56, Sven Van Caekenberghe <sven at stfx.eu> wrote:
> On 06 Nov 2013, at 13:20, Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
>> On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <sven at stfx.eu> wrote:
>>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream did not yet make use of the optimisations that I did for ZnCharacterEncoding some time ago. More specifically, they were not yet using #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: which are overwritten for ZnUTF8Encoder with (super hacked) versions that assume most of the input will be ASCII (a reasonable assumption).
>>> I am still chasing a bug, but right now:
>>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream binary))
>>> next: 65536; close ] bench.
>>> "135 per second.” BEFORE
>>> "3,310 per second.” AFTER
>>> But of course the input file is ASCII, so YMMV.
>>> I’ll let you know when I commit this code.
>> Yeah… sooo, I loaded the updated version, great improvement for streams on Latin1 content :D
>> Maybe it’s just me, but I tested with actual wide source as well (it was as slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* which uses the normal Byte -> Wide become: conversion machinery.
>> Writing a ZnByteStringBecameWideString handler for every next: (and cousins) call where source may be non-latin1 is a real chore/nasty surprise for those used to dealing with legacy streams/converters...
>> You can sort of kinda make up for the performance hit (at least on these sizes) with a faster replace:from:to:with:startingAt: in use after the fact, by using basicAt:put: to the string, thus avoiding converting the replacement value to character.
>> PS: Why is ZnByteStringBecameWideString a notification and not a resumable exception? I would assume those who run into it without a handler would rather have an actual error, than a result where their string has been read up to the first wide char…
> Henrik, thanks again for the feedback, it is really useful.
> You are right, the ZnByteStringBecameWideString hack was introduced specifically to avoid the become, because benchmarking showed that it took too long. The original user is ZnStringEntity>>#readFrom:
> Now, if you know up front that your string will be wide anyway, then there is #decodeBytesIntoWideString:
> Yes, ZnByteStringBecameWideString should probably be a resumable exception, as not handling it properly will lead to errors.
> I will have to think and reconsider the generality/specificity of the optimisation hack in ZnUTF8Encoder. I got some ideas, but it will take some time.
> At least the code works now, time to process the constructive criticism.
Time: 13 November 2013, 3:04:00.925461 pm
ZnByteStringBecameWideString is now a resumable Error instead of a Notification (as suggested by henrik sperre johansen);
Added ZnByteStringBecameWideString>>#becomeForward convenience method
Fixed ZnCharacterReadStream>>#readInto:startingAt:count: to do the actual becomeForward when needed so that clients are not affected
Time: 13 November 2013, 3:05:18.26391 pm
Added a unit test to make sure ZnCharacterReadStream>>#readInto:startingAt:count: does an actual becomeForward when needed
At least the situation is correct now.
More information about the Pharo-dev