[Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
nicolas.cellier.aka.nice at gmail.com
Fri Dec 6 18:44:43 EST 2013
Hem, switching #ascii <-> #binary does only make sense in... ASCII
With every other encoding, it's not something that makes sense at all, or
maybe #latin1 <-> #binary, #utf8 <-> #binary, #utf16 <-> #binary
2013/12/5 Max Leske <maxleske at gmail.com>
> There are several different approaches in different places:
> - FileStream reads strings by default. #binary and #ascii switch between
> formats. File streams use an internal buffer which is either a String
> (default) or a ByteArray. It’s even possible to switch between binary and
> ascii midstream without losing information (if done right) because it only
> affects the buffer.
> - ReadStream and WriteStream cannot change their format. Their behavior is
> determined by the underlying collection. Forcing conversions (e.g. by
> #asString) can lead to loss of information
> - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also
> support the #binary #ascii method of switching format. Default is #ascii
> - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
> - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is
> #binary (implicit); depends on the underlying stream
> I think the pattern to follow is clear: ReadStream and WriteStream should
> allow switching format with #ascii and #binary, default should be #ascii.
> However, I suspect there’s a reason that these classes don’t support
> switching, namely that switching makes the implementation more complicated
> and also slower because more checks need to be made.
> The easiest solution I see would be to implement something like this:
> ^ self isBinary
> ifTrue: [ self basicNext asCharacter ]
> ifFalse: [ self basicNext ]
> However, #next et al. are implemented in a plugin and the primitive method
> looks like this:
> <primitive: 65>
> position >= readLimit
> ifTrue: [^nil]
> ifFalse: [^collection at: (position := position + 1)]
> This means the collection instance variable has to hold either a binary or
> a string collection.
> I’ve found a solution which would work and I’ve whipped up a working way
> (there’s space for improvement…):
> collection isString ifFalse: [ ^ self ].
> collection := (ByteArray new: collection size) copyReplaceFrom: 1 to:
> collection size with: collection
> collection isString ifTrue: [ ^ self ].
> collection := (String new: collection size) copyReplaceFrom: 1 to:
> collection size with: collection
> opposed to what I wrote earlier, #asString does *not* destroy
> non-printable characters. Instead, every byte (from 0 to 255) is encoded as
> a character and thus the string can be converted back to a ByteArray
> *without* loss of information. Sorry about that.
> With this change in place the 12259 would become obsolete.
> Please let me know what you think. This is a pretty big change that might
> have a lot of consequences in the image.
> On 04.12.2013, at 13:14, Max Leske <maxleske at gmail.com> wrote:
> Let me see what I can come up with.
> On 03.12.2013, at 19:36, Damien Cassou <damien.cassou at gmail.com> wrote:
> Thanks Max for the report. Do you have an idea on how we could solve the
> problem ? The previous behaviour was not acceptable either because the
> streams that came out of a memory filesystem were the only ones with binary
> On Dec 3, 2013 5:35 PM, "Max Leske" <maxleske at gmail.com> wrote:
>> Damien, Marcus
>> this change breaks a lot of things in FileSystem-Git. I don’t disagree
>> with the idea that reading characters should be default (one could argue
>> about it…) but your change makes it IMPOSSIBLE to read bytes because
>> unprintable characters are discarded! So if my ByteArray is a NULL
>> terminated string, for instance, I can not check for the NULL termination
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pharo-dev