[Pharo-users] [From StackOverflow] How to parse ndjson in Pharo with NeoJSON

MartinW wm at fastmail.fm
Fri Jan 22 10:13:34 EST 2016


Thank you, Sven! (I asked the question on StackOverflow)

And also let me thank you for NeoJSON, NeoCSV and Zinc, which I use a lot
and which are a joy to use! Also the documentation is very good and helps a
lot.

Your code works well and I save a bit of memory by avoiding intermediary
data structures, but still this operation uses a lot more memory than I had
expected (the example file I use is 80 MB). I tried to parse with
PetitParser but the results were similar. I guess, i have to learn to find
out were all the memory goes.

Best regards,
Martin.



Sven Van Caekenberghe-2 wrote
> (I don't do StackOverflow)
> 
> Reading the 'format' is easy, just keep on doing #next for each JSON
> expression (whitespace is ignored).
> 
> | data reader |
> data := '{"smalltalk": "cool"}
> {"pharo": "cooler"}'.
> reader := NeoJSONReader on: data readStream.
> Array streamContents: [ :out |
>   [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
> 
> Preventing intermediary data structures is easy too, use streaming.
> 
> | client reader data networkStream |
> (client := ZnClient new)
>   streaming: true;
>   url:
> 'https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true';
>   get.
> networkStream := ZnCharacterReadStream on: client contents.
> reader := NeoJSONReader on: networkStream.
> data := Array streamContents: [ :out |
>   [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
> client close.
> data.
> 
> It took a couple of seconds, it is 80MB+ over the network for 50K items
> after all.
> 
> 
> 
> HTH,
> 
> Sven 
> 
> 
>> On 21 Jan 2016, at 12:02, Esteban Lorenzano <

> estebanlm@

> > wrote:
>> 
>> Hi, 
>> 
>> there is a question I don’t know how to answer.
>> 
>> http://stackoverflow.com/questions/34904337/how-to-parse-ndjson-in-pharo-with-neojson
>> 
>> Transcript: 
>> 
>> I want to parse ndjson (newline delimited json) data with NeoJSON on
>> Pharo Smalltalk.
>> 
>> ndjson data looks like this:
>> 
>> {"smalltalk": "cool"}
>> {"pharo": "cooler"}
>> At the moment I convert my file stream to a string, split it on newline
>> and then parse the single parts using NeoJSON. This seems to use an
>> unnecessary (and extremely huge) amount of memory and time, probably
>> because of converting streams to strings and vice-versa all the time.
>> What would be an efficient way to do this task?
>> 
>> 
>> Takers?
>> Esteban
> 
> 
> 
> Screen Shot 2016-01-21 at 13.33.57.png (480K)
> <http://forum.world.st/attachment/4873112/0/Screen%20Shot%202016-01-21%20at%2013.33.57.png>





--
View this message in context: http://forum.world.st/From-StackOverflow-How-to-parse-ndjson-in-Pharo-with-NeoJSON-tp4873097p4873385.html
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.




More information about the Pharo-users mailing list