[Pharo-users] [From StackOverflow] How to parse ndjson in Pharo with NeoJSON

Sven Van Caekenberghe sven at stfx.eu
Thu Jan 21 07:36:00 EST 2016


(I don't do StackOverflow)

Reading the 'format' is easy, just keep on doing #next for each JSON expression (whitespace is ignored).

| data reader |
data := '{"smalltalk": "cool"}
{"pharo": "cooler"}'.
reader := NeoJSONReader on: data readStream.
Array streamContents: [ :out |
  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].

Preventing intermediary data structures is easy too, use streaming.

| client reader data networkStream |
(client := ZnClient new)
  streaming: true;
  url: 'https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true';
  get.
networkStream := ZnCharacterReadStream on: client contents.
reader := NeoJSONReader on: networkStream.
data := Array streamContents: [ :out |
  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
client close.
data.

It took a couple of seconds, it is 80MB+ over the network for 50K items after all.



HTH,

Sven 


> On 21 Jan 2016, at 12:02, Esteban Lorenzano <estebanlm at gmail.com> wrote:
> 
> Hi, 
> 
> there is a question I don’t know how to answer.
> 
> http://stackoverflow.com/questions/34904337/how-to-parse-ndjson-in-pharo-with-neojson
> 
> Transcript: 
> 
> I want to parse ndjson (newline delimited json) data with NeoJSON on Pharo Smalltalk.
> 
> ndjson data looks like this:
> 
> {"smalltalk": "cool"}
> {"pharo": "cooler"}
> At the moment I convert my file stream to a string, split it on newline and then parse the single parts using NeoJSON. This seems to use an unnecessary (and extremely huge) amount of memory and time, probably because of converting streams to strings and vice-versa all the time. What would be an efficient way to do this task?
> 
> 
> Takers?
> Esteban

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-users_lists.pharo.org/attachments/20160121/b2c5cad5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-01-21 at 13.33.57.png
Type: image/png
Size: 359326 bytes
Desc: not available
URL: <http://lists.pharo.org/pipermail/pharo-users_lists.pharo.org/attachments/20160121/b2c5cad5/attachment.png>


More information about the Pharo-users mailing list