[Pharo-users] [ANN] NeoCSV

Sven Van Caekenberghe sven at beta9.be
Tue Jun 26 08:45:47 EDT 2012


Doru,

On 26 Jun 2012, at 14:21, Tudor Girba wrote:

> I was just teasing :).

Now that I met you in real life, I really can't imagine you would do something like that, teasing people just for the fun of it ;-)

Actually, just yesterday I was further optimizing NeoCSV on an actual example, reading a 3.3 MB file with 140.000 entries like

16777216,17301503,AU
17367040,17432575,MY
17435136,17435391,AU
17498112,17563647,KR
17563648,17825791,CN
17825792,18087935,KR
18153472,18219007,JP

The old code did very simply this:

readFrom: filename
	"Read from a CSV with field start,stop,code as in 3651886848,3651887103,BE"
	"self readFrom: '/Users/sven/Tmp/geo-ip-country/GeoIPCountry.csv'."
	
	| instance data |
	instance := self new.
	data := OrderedCollection new: 145000.
	FileStream oldFileNamed: filename do: [ :stream |
		[ stream atEnd ] whileFalse: [ | tokens range |
			tokens := stream nextLine findTokens: ','.
			range := IPAddressRangeCountry 
				from: tokens first asNumber 
				to: tokens second asNumber 
				country: tokens third asSymbol.
			data add: range ] ].
	instance data: data asArray.
	^ instance
	
The new code using NeoCSV is this:

readFrom: filename
	"Read from a CSV with field start,stop,code as in 3651886848,3651887103,BE"
	"self readFrom: '/Users/sven/Tmp/geo-ip-country/GeoIPCountry.csv'."
	
	^ self new
		data: (FileStream oldFileNamed: filename do: [ :stream |
					(NeoCSVReader on: stream)
						recordClass: IPAddressRangeCountry;
						addIntegerField: #start: ; addIntegerField: #stop: ; addSymbolField: #country: ;
						upToEnd ]);
		yourself

The new code is simpler, does more internally, and is faster (3.5 vs 2.5 seconds).

Yes I am happy, and looking for users ;-)

Now I am going back to struggling with Metacello.

Sven

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill






More information about the Pharo-users mailing list