JH

Jimmie Houchin

Thu, Jan 6, 2022 8:37 PM

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
     sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
         + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
     sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

I have written a micro benchmark which stresses a language in areas which are crucial to my application. I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. And when I sum the array it gives the correct results. So I can verify its validity. To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. 28800 is simply derived from time series one minute values for 5 days, 4 weeks. randarray := Array new: 28800. 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. randarrayttr. "0:00:00:36.135" I do 2 loops with 100 iterations each. randarrayttr * 200. "0:02:00:27" I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. However I have played around with several experiments of my #sum: method. This implementation reduces the time on the above randarray in half. sum: col | sum | sum := 0. 1 to: col size do: [ :i | sum := sum + (col at: i) ]. ^ sum randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. randarrayttr2. "0:00:00:18.563" And this one reduces it a little more. sum10: col | sum | sum := 0. 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. ((col size quo: 10) * 10 + 1) to: col size do: [ :i | sum := sum + (col at: i)]. ^ sum randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. randarrayttr3. "0:00:00:14.592" It closes the gap with plain Python3 no numpy. But that is a pretty low standard. Any ideas, thoughts, wisdom, directions to pursue. Thanks Jimmie

GP

Guillermo Polito

Thu, Jan 6, 2022 9:07 PM

Hi Jummie,

Is it possible that your program is computing a lot of very large integers?

I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

El 6 ene 2022, a las 21:37, Jimmie Houchin jlhouchin@gmail.com escribió:

I have written a micro benchmark which stresses a language in areas which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its validity.

To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Hi Jummie, Is it possible that your program is computing a lot of **very** large integers? I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5. a := (1 to: 28000). [a sum] bench "'20256552.490 per second’" If you could share with us more data, we could take a look. Now i’m curious. Thanks, G > El 6 ene 2022, a las 21:37, Jimmie Houchin <jlhouchin@gmail.com> escribió: > > I have written a micro benchmark which stresses a language in areas which are crucial to my application. > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. > > On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. > > Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( > > In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. > And when I sum the array it gives the correct results. So I can verify its validity. > > To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. > > 28800 is simply derived from time series one minute values for 5 days, 4 weeks. > > randarray := Array new: 28800. > > 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. > > randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. > > randarrayttr. "0:00:00:36.135" > > > I do 2 loops with 100 iterations each. > > randarrayttr * 200. "0:02:00:27" > > > I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. > > Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? > > I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. > > > However I have played around with several experiments of my #sum: method. > > This implementation reduces the time on the above randarray in half. > > sum: col > | sum | > sum := 0. > 1 to: col size do: [ :i | > sum := sum + (col at: i) ]. > ^ sum > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > randarrayttr2. "0:00:00:18.563" > > And this one reduces it a little more. > > sum10: col > | sum | > sum := 0. > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > sum := sum + (col at: i)]. > ^ sum > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > randarrayttr3. "0:00:00:14.592" > > It closes the gap with plain Python3 no numpy. But that is a pretty low standard. > > Any ideas, thoughts, wisdom, directions to pursue. > > Thanks > > Jimmie >

JH

Jimmie Houchin

Thu, Jan 6, 2022 10:35 PM

No, it is an array of floats. The only integers in the test are in the
indexes of the loops.

Number random. "generates a float 0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't
care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is
all new to me. I uploaded my language test so you can see what it does.
It is a micro-benchmark. It does things that are not realistic in an
app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Thanks for your help.

Jimmie

On 1/6/22 15:07, Guillermo Polito wrote:

Hi Jummie,

Is it possible that your program is computing a lot of very large integers?

I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

El 6 ene 2022, a las 21:37, Jimmie Houchin jlhouchin@gmail.com escribió:

I have written a micro benchmark which stresses a language in areas which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its validity.

To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

No, it is an array of floats. The only integers in the test are in the indexes of the loops. Number random. "generates a float 0.8188008774329387" So in the randarray below it is an array of 28800 floats. It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. https://github.com/jlhouchin/LanguageTestPharo Let me know if there is anything else I can do to help solve this problem. I am a lone developer in my spare time. So my apologies for any ugly code. Thanks for your help. Jimmie On 1/6/22 15:07, Guillermo Polito wrote: > Hi Jummie, > > Is it possible that your program is computing a lot of **very** large integers? > > I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5. > > a := (1 to: 28000). > [a sum] bench "'20256552.490 per second’" > > If you could share with us more data, we could take a look. > Now i’m curious. > > Thanks, > G > >> El 6 ene 2022, a las 21:37, Jimmie Houchin <jlhouchin@gmail.com> escribió: >> >> I have written a micro benchmark which stresses a language in areas which are crucial to my application. >> >> I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. >> >> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. >> >> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( >> >> In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. >> And when I sum the array it gives the correct results. So I can verify its validity. >> >> To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. >> >> 28800 is simply derived from time series one minute values for 5 days, 4 weeks. >> >> randarray := Array new: 28800. >> >> 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. >> >> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. >> >> randarrayttr. "0:00:00:36.135" >> >> >> I do 2 loops with 100 iterations each. >> >> randarrayttr * 200. "0:02:00:27" >> >> >> I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. >> >> Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? >> >> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. >> >> >> However I have played around with several experiments of my #sum: method. >> >> This implementation reduces the time on the above randarray in half. >> >> sum: col >> | sum | >> sum := 0. >> 1 to: col size do: [ :i | >> sum := sum + (col at: i) ]. >> ^ sum >> >> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." >> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. >> randarrayttr2. "0:00:00:18.563" >> >> And this one reduces it a little more. >> >> sum10: col >> | sum | >> sum := 0. >> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) >> + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. >> ((col size quo: 10) * 10 + 1) to: col size do: [ :i | >> sum := sum + (col at: i)]. >> ^ sum >> >> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." >> ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. >> randarrayttr3. "0:00:00:14.592" >> >> It closes the gap with plain Python3 no numpy. But that is a pretty low standard. >> >> Any ideas, thoughts, wisdom, directions to pursue. >> >> Thanks >> >> Jimmie >>

JB

John Brant

Fri, Jan 7, 2022 12:24 AM

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float 0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: > > No, it is an array of floats. The only integers in the test are in the indexes of the loops. > > Number random. "generates a float 0.8188008774329387" > > So in the randarray below it is an array of 28800 floats. > > It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... > > > I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. > > > https://github.com/jlhouchin/LanguageTestPharo > > > Let me know if there is anything else I can do to help solve this problem. > > I am a lone developer in my spare time. So my apologies for any ugly code. > Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: 1 to: nsize do: [ :j || n | n := narray at: j. narray at: j put: (self loop1calc: i j: j n: n). nsum := narray sum. navg := narray average ] As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. John Brant

SD

stephane ducasse

Fri, Jan 7, 2022 9:52 AM

Thanks John

This was an important remark :)

Another remark is that you can also call BLAS for heavy mathematical operations (this is what numpy is doing just calling large fortran library and I do not know for julia but it should be same).
And this is easy to do in Pharo.

https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/ https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/

And now you can just define a lot more easily a new binding.

S

On 7 Jan 2022, at 01:24, John Brant brant@refactoryworkers.com wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com mailto:jlhouchin@gmail.com> wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float 0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

Thanks John This was an important remark :) Another remark is that you can also call BLAS for heavy mathematical operations (this is what numpy is doing just calling large fortran library and I do not know for julia but it should be same). And this is easy to do in Pharo. https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/ <https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/> And now you can just define a lot more easily a new binding. S > On 7 Jan 2022, at 01:24, John Brant <brant@refactoryworkers.com> wrote: > > On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com <mailto:jlhouchin@gmail.com>> wrote: >> >> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >> >> Number random. "generates a float 0.8188008774329387" >> >> So in the randarray below it is an array of 28800 floats. >> >> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >> >> >> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >> >> >> https://github.com/jlhouchin/LanguageTestPharo >> >> >> Let me know if there is anything else I can do to help solve this problem. >> >> I am a lone developer in my spare time. So my apologies for any ugly code. >> > > Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: > > 1 to: nsize do: [ :j || n | > n := narray at: j. > narray at: j put: (self loop1calc: i j: j n: n). > nsum := narray sum. > navg := narray average ] > > As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. > > > John Brant

GP

Guillermo Polito

Fri, Jan 7, 2022 10:00 AM

Yes, I just saw also that I used an interval instead of an array… I need to sleep more ^^

Anyways, even with a 28k large array wether they are small integers or floats, I have "reasonable results” (where reasonable = not taking hours, nor minutes but a couple of milliseconds :P)

randarray := Array new: 28800 withAll: 0.
[randarray sum] bench "'2059.176 per second'"

randarray2 := Array new: 28800 withAll: 0.1234567.
[randarray2 sum] bench "'1771.737 per second’"

I join John’s request to see the Python code…
Is that possible?
G

El 6 ene 2022, a las 23:35, Jimmie Houchin jlhouchin@gmail.com escribió: