Sven Van Caekenberghe
sven at stfx.eu
Wed Nov 6 07:56:48 EST 2013
On 06 Nov 2013, at 13:20, Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
> On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <sven at stfx.eu> wrote:
>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream did not yet make use of the optimisations that I did for ZnCharacterEncoding some time ago. More specifically, they were not yet using #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: which are overwritten for ZnUTF8Encoder with (super hacked) versions that assume most of the input will be ASCII (a reasonable assumption).
>> I am still chasing a bug, but right now:
>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream binary))
>> next: 65536; close ] bench.
>> "135 per second.” BEFORE
>> "3,310 per second.” AFTER
>> But of course the input file is ASCII, so YMMV.
>> I’ll let you know when I commit this code.
> Yeah… sooo, I loaded the updated version, great improvement for streams on Latin1 content :D
> Maybe it’s just me, but I tested with actual wide source as well (it was as slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* which uses the normal Byte -> Wide become: conversion machinery.
> Writing a ZnByteStringBecameWideString handler for every next: (and cousins) call where source may be non-latin1 is a real chore/nasty surprise for those used to dealing with legacy streams/converters...
> You can sort of kinda make up for the performance hit (at least on these sizes) with a faster replace:from:to:with:startingAt: in use after the fact, by using basicAt:put: to the string, thus avoiding converting the replacement value to character.
> PS: Why is ZnByteStringBecameWideString a notification and not a resumable exception? I would assume those who run into it without a handler would rather have an actual error, than a result where their string has been read up to the first wide char…
Henrik, thanks again for the feedback, it is really useful.
You are right, the ZnByteStringBecameWideString hack was introduced specifically to avoid the become, because benchmarking showed that it took too long. The original user is ZnStringEntity>>#readFrom:
Now, if you know up front that your string will be wide anyway, then there is #decodeBytesIntoWideString:
Yes, ZnByteStringBecameWideString should probably be a resumable exception, as not handling it properly will lead to errors.
I will have to think and reconsider the generality/specificity of the optimisation hack in ZnUTF8Encoder. I got some ideas, but it will take some time.
At least the code works now, time to process the constructive criticism.
More information about the Pharo-dev