[Pharo-users] [From StackOverflow] How to parse ndjson in Pharo with NeoJSON

MartinW wm at fastmail.fm
Fri Jan 22 10:49:41 EST 2016


Sven Van Caekenberghe-2 wrote
> Well, it is quite a bit of data (I didn't look too deeply), 50.000 records
> of structured/nested data with quite a lot of strings. If each record is
> 1Kb, that makes 50Mb.
> 
> How do you measure your memory consumption ? What did you expect ?

I did only think about memory, when my first attempts to parse the file
reached the VM's memory limit, which seemed to be at ~500MB on OS X out of
the box. Then I did only watch the memory from outside, using OS X's
Activity Monitor and after I gave the VM more memory, the image grew up to
1.2 GB while parsing and inspecting the 80MB file. But I did not yet
investigate, were the memory went - perhaps it is all in the Inspector that
I opened to view the result :)


Sven Van Caekenberghe-2 wrote
> Right now, your JSON is parsed and the result is a combination of lists
> (Array) and maps (Dictionary). If you know/understand well what is inside
> it, and it is regular enough, you could try to build your own
> specialised/optimised data/domain model for it. NeoJSON can also parse
> directly to your objects, instead of the general ones (a process called
> mapping). This is some work, of course, and it might not be worth it,
> YMMV.

Yes, I have used mappings in the past. Here I was just toying with the New
York Public Library's Open Source data for a second...


Sven Van Caekenberghe-2 wrote
> Sven  
> 
>> I tried to parse with
>> PetitParser but the results were similar. I guess, i have to learn to
>> find
>> out were all the memory goes.
>> 
>> Best regards,
>> Martin.
>> 
>> 
>> 
>> Sven Van Caekenberghe-2 wrote
>>> (I don't do StackOverflow)
>>> 
>>> Reading the 'format' is easy, just keep on doing #next for each JSON
>>> expression (whitespace is ignored).
>>> 
>>> | data reader |
>>> data := '{"smalltalk": "cool"}
>>> {"pharo": "cooler"}'.
>>> reader := NeoJSONReader on: data readStream.
>>> Array streamContents: [ :out |
>>>  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
>>> 
>>> Preventing intermediary data structures is easy too, use streaming.
>>> 
>>> | client reader data networkStream |
>>> (client := ZnClient new)
>>>  streaming: true;
>>>  url:
>>> 'https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true';
>>>  get.
>>> networkStream := ZnCharacterReadStream on: client contents.
>>> reader := NeoJSONReader on: networkStream.
>>> data := Array streamContents: [ :out |
>>>  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
>>> client close.
>>> data.
>>> 
>>> It took a couple of seconds, it is 80MB+ over the network for 50K items
>>> after all.
>>> 
>>> 
>>> 
>>> HTH,
>>> 
>>> Sven 
>>> 
>>> 
>>>> On 21 Jan 2016, at 12:02, Esteban Lorenzano <
>> 
>>> estebanlm@
>> 
>>> > wrote:
>>>> 
>>>> Hi, 
>>>> 
>>>> there is a question I don’t know how to answer.
>>>> 
>>>> http://stackoverflow.com/questions/34904337/how-to-parse-ndjson-in-pharo-with-neojson
>>>> 
>>>> Transcript: 
>>>> 
>>>> I want to parse ndjson (newline delimited json) data with NeoJSON on
>>>> Pharo Smalltalk.
>>>> 
>>>> ndjson data looks like this:
>>>> 
>>>> {"smalltalk": "cool"}
>>>> {"pharo": "cooler"}
>>>> At the moment I convert my file stream to a string, split it on newline
>>>> and then parse the single parts using NeoJSON. This seems to use an
>>>> unnecessary (and extremely huge) amount of memory and time, probably
>>>> because of converting streams to strings and vice-versa all the time.
>>>> What would be an efficient way to do this task?
>>>> 
>>>> 
>>>> Takers?
>>>> Esteban
>>> 
>>> 
>>> 
>>> Screen Shot 2016-01-21 at 13.33.57.png (480K)
>>> <http://forum.world.st/attachment/4873112/0/Screen%20Shot%202016-01-21%20at%2013.33.57.png>
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://forum.world.st/From-StackOverflow-How-to-parse-ndjson-in-Pharo-with-NeoJSON-tp4873097p4873385.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.





--
View this message in context: http://forum.world.st/From-StackOverflow-How-to-parse-ndjson-in-Pharo-with-NeoJSON-tp4873097p4873399.html
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.




More information about the Pharo-users mailing list