[Pharo-dev] FloatArray

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue May 21 14:59:30 EDT 2019


And my less performant 2.7 GHz Intel Core i5 MBP with Apple accelerated
VecLib is way faster than naive netlib BLAS (I guess it's multi-threaded):

| a b |
a := LapackDGEMatrix randNormal: #(1000 1000).
b := LapackDGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun.
 45

| a b |
a := LapackSGEMatrix randNormal: #(1000 1000).
b := LapackSGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun
 19


Le mar. 21 mai 2019 à 10:05, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> Hi Serge,
> this is good news, having tensor flow bindings is also a must!
> I have this in Smallapack with pure CPU unaccelerated blas (no MKL, nor
> ATLAS, just plain and dumb netlib code)
>
> | a b |
> a := LapackDGEMatrix randNormal: #(1000 1000).
> b := LapackDGEMatrix randNormal: #(1000 1000).
> [a * b] timeToRun
>  783
>
> | a b |
> a := LapackSGEMatrix randNormal: #(1000 1000).
> b := LapackSGEMatrix randNormal: #(1000 1000).
> [a * b] timeToRun
>  448
>
> Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
> So I think that we can get much better with accelerated library!
>
> Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <serge.stinckwich at gmail.com>
> a écrit :
>
>> There is another solution with my TensorFlow Pharo binding:
>> https://github.com/PolyMathOrg/libtensorflow-pharo-bindings
>>
>> You can do a matrix multiplication like that :
>>
>> | graph t1 t2 c1 c2 mult session result |
>> graph := TF_Graph create.
>> t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
>> t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
>> c1 := graph const: 'c1' value: t1.
>> c2 := graph const: 'c2' value: t2.
>> mult := c1 * c2.
>> session := TF_Session on: graph.
>> result := session runOutput: (mult output: 0).
>> result asNumbers
>>
>> Here I'm doing a multiplication between 2 matrices of 1000X1000 size in
>> 537 ms on my computer.
>>
>> All operations can be done in a graph of operations that is run outside
>> Pharo, so could be very fast.
>> Operations can be done on CPU or GPU. 32 bits or 64 bits float operations
>> are possible.
>>
>> This is a work in progress but can already be used.
>> Regards,
>>
>>
>>
>> On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <jlhouchin at gmail.com>
>> wrote:
>>
>>> I wasn't worried about how to do sliding windows. My problem is that
>>> using LapackDGEMatrix in my example was 18x slower than FloatArray, which
>>> is slower than Numpy. It isn't what I was expecting.
>>>
>>> What I didn't know is if I was doing something wrong to cause such a
>>> tremendous slow down.
>>>
>>> Python and Numpy is not my favorite. But it isn't uncomfortable.
>>>
>>> So I gave up and went back to Numpy.
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On 5/20/19 5:17 PM, Nicolas Cellier wrote:
>>>
>>> Hi Jimmie,
>>> effectively I did not subsribe...
>>> Having efficient methods for sliding window average is possible, here is
>>> how I would do it:
>>>
>>> "Create a vector with 100,000 rows filles with random values (uniform
>>> distrubution in [0,1]"
>>> v := LapackDGEMatrix randUniform: #(100000 1).
>>>
>>> "extract values from rank 10001 to 20000"
>>> w1 := v atIntervalFrom: 10001 to: 20000 by: 1.
>>>
>>> "create a left multiplier matrix for performing average of w1"
>>> a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.
>>>
>>> "get the average (this is a 1x1 matrix from which we take first element)"
>>> avg1 := (a * w1) at: 1.
>>>
>>> [ "select another slice of same size"
>>> w2 := v atIntervalFrom: 15001 to: 25000 by: 1.
>>>
>>> "get the average (we can recycle a)"
>>> avg2 := (a * w2) at: 1 ] bench.
>>>
>>> This gives:
>>>  '16,500 per second. 60.7 microseconds per run.'
>>> versus:
>>> [w2 sum / w2 size] bench.
>>>  '1,100 per second. 908 microseconds per run.'
>>>
>>> For max and min, it's harder. Lapack/Blas only provide max of absolute
>>> value as primitive:
>>> [w2 absMax] bench.
>>>  '19,400 per second. 51.5 microseconds per run.'
>>>
>>> Everything else will be slower, unless we write new primitives in C and
>>> connect them...
>>> [w2 maxOf: [:each | each]] bench.
>>>  '984 per second. 1.02 milliseconds per run.'
>>>
>>> Le dim. 19 mai 2019 à 14:58, Jimmie <jlhouchin at gmail.com> a écrit :
>>>
>>>> On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
>>>>  > Did someone tried to use Smallapack in Pharo?
>>>>  > Jimmie?
>>>>  >
>>>>
>>>> I am going to guess that you are not on pharo-users. My bad.
>>>> I posted this in pharo-users as I it wasn't Pharo development question.
>>>>
>>>> I probably should have posted here or emailed you directly.
>>>>
>>>> All I really need is good performance with a simple array of floats. No
>>>> matrix math. Nothing complicated. Moving Averages over a slice of the
>>>> array. A variety of different averages, weighted, etc. Max/min of the
>>>> array. But just a single simple array.
>>>>
>>>> Any help greatly appreciated.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On 4/28/19 8:32 PM, Jimmie Houchin wrote:
>>>> Hello,
>>>>
>>>> I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.
>>>>
>>>> I am very unsure on my use of Smallapack. I am not a mathematician or
>>>> scientist. However the only part of Smallapack I am trying to use at
>>>> the
>>>> moment is something that would  be 64bit and compare to FloatArray so
>>>> that I can do some simple accessing, slicing, sum, and average on the
>>>> array.
>>>>
>>>> Here is some sample code I wrote just to play in a playground.
>>>>
>>>> I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
>>>> samples. The ones not in use are commented out for any run.
>>>>
>>>> fp is a download from
>>>> http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
>>>> and unzipped to a directory.
>>>>
>>>> fp := '/home/jimmie/data/EUR_USD_Week1.csv'
>>>> index := 0.
>>>> pricesSum := 0.
>>>> asum := 0.
>>>> ttr := [
>>>>      lines := fp asFileReference contents lines allButFirst.
>>>>      a := ExternalDoubleArray new: lines size.
>>>>      "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
>>>>      a := la columnAt: 1."
>>>>      "a := FloatArray new: lines size."
>>>>      lines do: [ :line || parts price |
>>>>          parts := ',' split: line.
>>>>          index := index + 1.
>>>>          price := Float readFrom: (parts last).
>>>>          a at: index put: price.
>>>>          pricesSum := pricesSum + price.
>>>>          (index rem: 100) = 0 ifTrue: [
>>>>              asum := a sum.
>>>>       ]]] timeToRun.
>>>> { index. pricesSum. asum. ttr }.
>>>>   "ExternalDoubleArray an Array(337588 383662.5627699992
>>>> 383562.2956199993 0:00:01:59.885)"
>>>>   "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
>>>> 0:00:00:06.555)"
>>>>
>>>> FloatArray is not the precision I need. But it is over 18x faster.
>>>>
>>>> I am afraid I must be doing something badly wrong. Python/Numpy is over
>>>> 4x faster than FloatArray for the above.
>>>>
>>>> If I am using Smallapack incorrectly please help.
>>>>
>>>> Any help greatly appreciated.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>
>> --
>> Serge Stinckwic
>> h
>>
>> Int. Research Unit
>>  on Modelling/Simulation of Complex Systems (UMMISCO)
>> Sorbonne University
>>  (SU)
>> French National Research Institute for Sustainable Development (IRD)
>> U
>> niversity of Yaoundé I, Cameroon
>> "Programs must be written for people to read, and only incidentally for
>> machines to execute."
>> https://twitter.com/SergeStinckwich
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20190521/4df74314/attachment.html>


More information about the Pharo-dev mailing list