[Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

Max Leske maxleske at gmail.com
Sat Dec 7 01:08:05 EST 2013


I agree but that is a problem inherent to the current implementation and it’s not really my goal now to fix all the shortcomings :) I simply want a consistent way to get through this (since I’ve heard that the streams might be replaced with Xtreams…).


On 07.12.2013, at 00:44, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:

> Hem, switching #ascii <-> #binary does only make sense in... ASCII
> With every other encoding, it's not something that makes sense at all, or maybe #latin1 <-> #binary, #utf8 <-> #binary, #utf16 <-> #binary
> 
> 
> 2013/12/5 Max Leske <maxleske at gmail.com>
> There are several different approaches in different places:
> 
> - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer.
> - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information
> - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii
> - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
> - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream
> 
> I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made.
> 
> The easiest solution I see would be to implement something like this:
> 
> ReadStream>>next
> 	^ self isBinary
> 		ifTrue: [ self basicNext asCharacter ]
> 		ifFalse: [ self basicNext ]
> 
> However, #next et al. are implemented in a plugin and the primitive method looks like this:
> 
> ReadStream>>next
> 	<primitive: 65> 
> 	position >= readLimit 
> 		ifTrue: [^nil] 
> 		ifFalse: [^collection at: (position := position + 1)]
> 
> This means the collection instance variable has to hold either a binary or a string collection.
> 
> I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…):
> 
> ReadStream>>binary
> 	collection isString ifFalse: [ ^ self ].
> 	collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection
> 
> ReadStrem>>ascii
> 	collection isString ifTrue: [ ^ self ].
> 	collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection
> 
> @Damien
> opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that.
> 
> With this change in place the 12259 would become obsolete.
> 
> Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image.
> 
> Cheers,
> Max
> 
> On 04.12.2013, at 13:14, Max Leske <maxleske at gmail.com> wrote:
> 
>> Let me see what I can come up with.
>> 
>> 
>> On 03.12.2013, at 19:36, Damien Cassou <damien.cassou at gmail.com> wrote:
>> 
>>> Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content
>>> 
>>> On Dec 3, 2013 5:35 PM, "Max Leske" <maxleske at gmail.com> wrote:
>>> Damien, Marcus
>>> 
>>> this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore.
>>> 
>>> Cheers,
>>> Max
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131207/a7def6c6/attachment-0002.html>


More information about the Pharo-dev mailing list