Array sum. is very slow

JH
Jimmie Houchin
Thu, Jan 6, 2022 8:37 PM

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
     sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
    ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
     sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
         + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
     sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
    ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

I have written a micro benchmark which stresses a language in areas which are crucial to my application. I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. And when I sum the array it gives the correct results. So I can verify its validity. To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. 28800 is simply derived from time series one minute values for 5 days, 4 weeks. randarray := Array new: 28800. 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. randarrayttr. "0:00:00:36.135" I do 2 loops with 100 iterations each. randarrayttr * 200. "0:02:00:27" I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. However I have played around with several experiments of my #sum: method. This implementation reduces the time on the above randarray in half. sum: col | sum | sum := 0. 1 to: col size do: [ :i |      sum := sum + (col at: i) ]. ^ sum randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."     ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. randarrayttr2. "0:00:00:18.563" And this one reduces it a little more. sum10: col | sum | sum := 0. 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |      sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))          + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. ((col size quo: 10) * 10 + 1) to: col size do: [ :i |      sum := sum + (col at: i)]. ^ sum randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."     ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. randarrayttr3. "0:00:00:14.592" It closes the gap with plain Python3 no numpy. But that is a pretty low standard. Any ideas, thoughts, wisdom, directions to pursue. Thanks Jimmie
GP
Guillermo Polito
Thu, Jan 6, 2022 9:07 PM

Hi Jummie,

Is it possible that your program is computing a lot of very large integers?

I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

El 6 ene 2022, a las 21:37, Jimmie Houchin jlhouchin@gmail.com escribió:

I have written a micro benchmark which stresses a language in areas which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its validity.

To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Hi Jummie, Is it possible that your program is computing a lot of **very** large integers? I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5. a := (1 to: 28000). [a sum] bench "'20256552.490 per second’" If you could share with us more data, we could take a look. Now i’m curious. Thanks, G > El 6 ene 2022, a las 21:37, Jimmie Houchin <jlhouchin@gmail.com> escribió: > > I have written a micro benchmark which stresses a language in areas which are crucial to my application. > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. > > On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. > > Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( > > In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. > And when I sum the array it gives the correct results. So I can verify its validity. > > To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. > > 28800 is simply derived from time series one minute values for 5 days, 4 weeks. > > randarray := Array new: 28800. > > 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. > > randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. > > randarrayttr. "0:00:00:36.135" > > > I do 2 loops with 100 iterations each. > > randarrayttr * 200. "0:02:00:27" > > > I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. > > Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? > > I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. > > > However I have played around with several experiments of my #sum: method. > > This implementation reduces the time on the above randarray in half. > > sum: col > | sum | > sum := 0. > 1 to: col size do: [ :i | > sum := sum + (col at: i) ]. > ^ sum > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > randarrayttr2. "0:00:00:18.563" > > And this one reduces it a little more. > > sum10: col > | sum | > sum := 0. > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > sum := sum + (col at: i)]. > ^ sum > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > randarrayttr3. "0:00:00:14.592" > > It closes the gap with plain Python3 no numpy. But that is a pretty low standard. > > Any ideas, thoughts, wisdom, directions to pursue. > > Thanks > > Jimmie >
JH
Jimmie Houchin
Thu, Jan 6, 2022 10:35 PM

No, it is an array of floats. The only integers in the test are in the
indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't
care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is
all new to me. I uploaded my language test so you can see what it does.
It is a micro-benchmark. It does things that are not realistic in an
app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Thanks for your help.

Jimmie

On 1/6/22 15:07, Guillermo Polito wrote:

Hi Jummie,

Is it possible that your program is computing a lot of very large integers?

I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

El 6 ene 2022, a las 21:37, Jimmie Houchin jlhouchin@gmail.com escribió:

I have written a micro benchmark which stresses a language in areas which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its validity.

To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

No, it is an array of floats. The only integers in the test are in the indexes of the loops. Number random. "generates a float  0.8188008774329387" So in the randarray below it is an array of 28800 floats. It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. https://github.com/jlhouchin/LanguageTestPharo Let me know if there is anything else I can do to help solve this problem. I am a lone developer in my spare time. So my apologies for any ugly code. Thanks for your help. Jimmie On 1/6/22 15:07, Guillermo Polito wrote: > Hi Jummie, > > Is it possible that your program is computing a lot of **very** large integers? > > I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5. > > a := (1 to: 28000). > [a sum] bench "'20256552.490 per second’" > > If you could share with us more data, we could take a look. > Now i’m curious. > > Thanks, > G > >> El 6 ene 2022, a las 21:37, Jimmie Houchin <jlhouchin@gmail.com> escribió: >> >> I have written a micro benchmark which stresses a language in areas which are crucial to my application. >> >> I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. >> >> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. >> >> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( >> >> In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. >> And when I sum the array it gives the correct results. So I can verify its validity. >> >> To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. >> >> 28800 is simply derived from time series one minute values for 5 days, 4 weeks. >> >> randarray := Array new: 28800. >> >> 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. >> >> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. >> >> randarrayttr. "0:00:00:36.135" >> >> >> I do 2 loops with 100 iterations each. >> >> randarrayttr * 200. "0:02:00:27" >> >> >> I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. >> >> Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? >> >> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. >> >> >> However I have played around with several experiments of my #sum: method. >> >> This implementation reduces the time on the above randarray in half. >> >> sum: col >> | sum | >> sum := 0. >> 1 to: col size do: [ :i | >> sum := sum + (col at: i) ]. >> ^ sum >> >> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." >> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. >> randarrayttr2. "0:00:00:18.563" >> >> And this one reduces it a little more. >> >> sum10: col >> | sum | >> sum := 0. >> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) >> + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. >> ((col size quo: 10) * 10 + 1) to: col size do: [ :i | >> sum := sum + (col at: i)]. >> ^ sum >> >> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." >> ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. >> randarrayttr3. "0:00:00:14.592" >> >> It closes the gap with plain Python3 no numpy. But that is a pretty low standard. >> >> Any ideas, thoughts, wisdom, directions to pursue. >> >> Thanks >> >> Jimmie >>
JB
John Brant
Fri, Jan 7, 2022 12:24 AM

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: > > No, it is an array of floats. The only integers in the test are in the indexes of the loops. > > Number random. "generates a float 0.8188008774329387" > > So in the randarray below it is an array of 28800 floats. > > It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... > > > I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. > > > https://github.com/jlhouchin/LanguageTestPharo > > > Let me know if there is anything else I can do to help solve this problem. > > I am a lone developer in my spare time. So my apologies for any ugly code. > Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: 1 to: nsize do: [ :j || n | n := narray at: j. narray at: j put: (self loop1calc: i j: j n: n). nsum := narray sum. navg := narray average ] As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. John Brant
SD
stephane ducasse
Fri, Jan 7, 2022 9:52 AM

Thanks John

This was an important remark :)

Another remark is that you can also call BLAS for heavy mathematical operations (this is what numpy is doing just calling large fortran library and I do not know for julia but it should be same).
And this is easy to do in Pharo.

https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/ https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/

And now you can just define a lot more easily a new binding.

S

On 7 Jan 2022, at 01:24, John Brant brant@refactoryworkers.com wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com mailto:jlhouchin@gmail.com> wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

Thanks John This was an important remark :) Another remark is that you can also call BLAS for heavy mathematical operations (this is what numpy is doing just calling large fortran library and I do not know for julia but it should be same). And this is easy to do in Pharo. https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/ <https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/> And now you can just define a lot more easily a new binding. S > On 7 Jan 2022, at 01:24, John Brant <brant@refactoryworkers.com> wrote: > > On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com <mailto:jlhouchin@gmail.com>> wrote: >> >> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >> >> Number random. "generates a float 0.8188008774329387" >> >> So in the randarray below it is an array of 28800 floats. >> >> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >> >> >> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >> >> >> https://github.com/jlhouchin/LanguageTestPharo >> >> >> Let me know if there is anything else I can do to help solve this problem. >> >> I am a lone developer in my spare time. So my apologies for any ugly code. >> > > Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: > > 1 to: nsize do: [ :j || n | > n := narray at: j. > narray at: j put: (self loop1calc: i j: j n: n). > nsum := narray sum. > navg := narray average ] > > As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. > > > John Brant
GP
Guillermo Polito
Fri, Jan 7, 2022 10:00 AM

Yes, I just saw also that I used an interval instead of an array… I need to sleep more ^^

Anyways, even with a 28k large array wether they are small integers or floats, I have "reasonable results” (where reasonable = not taking hours, nor minutes but a couple of milliseconds :P)

randarray := Array new: 28800 withAll: 0.
[randarray sum] bench "'2059.176 per second'"

randarray2 := Array new: 28800 withAll: 0.1234567.
[randarray2 sum] bench "'1771.737 per second’"

I join John’s request to see the Python code…
Is that possible?
G

El 6 ene 2022, a las 23:35, Jimmie Houchin jlhouchin@gmail.com escribió:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Thanks for your help.

Jimmie

On 1/6/22 15:07, Guillermo Polito wrote:

Hi Jummie,

Is it possible that your program is computing a lot of very large integers?

I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

El 6 ene 2022, a las 21:37, Jimmie Houchin jlhouchin@gmail.com escribió:

I have written a micro benchmark which stresses a language in areas which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its validity.

To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Yes, I just saw also that I used an interval instead of an array… I need to sleep more ^^ Anyways, even with a 28k large array wether they are small integers or floats, I have "reasonable results” (where reasonable = not taking hours, nor minutes but a couple of milliseconds :P) randarray := Array new: 28800 withAll: 0. [randarray sum] bench "'2059.176 per second'" randarray2 := Array new: 28800 withAll: 0.1234567. [randarray2 sum] bench "'1771.737 per second’" I join John’s request to see the Python code… Is that possible? G > El 6 ene 2022, a las 23:35, Jimmie Houchin <jlhouchin@gmail.com> escribió: > > No, it is an array of floats. The only integers in the test are in the indexes of the loops. > > Number random. "generates a float 0.8188008774329387" > > So in the randarray below it is an array of 28800 floats. > > It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... > > > I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. > > > https://github.com/jlhouchin/LanguageTestPharo > > > Let me know if there is anything else I can do to help solve this problem. > > I am a lone developer in my spare time. So my apologies for any ugly code. > > > Thanks for your help. > > Jimmie > > > On 1/6/22 15:07, Guillermo Polito wrote: >> Hi Jummie, >> >> Is it possible that your program is computing a lot of **very** large integers? >> >> I’m just trying the following with small numbers, and I don’t see the issue. #sum executes on a 28k large collection around 20 million times per second on my old 2015 i5. >> >> a := (1 to: 28000). >> [a sum] bench "'20256552.490 per second’" >> >> If you could share with us more data, we could take a look. >> Now i’m curious. >> >> Thanks, >> G >> >>> El 6 ene 2022, a las 21:37, Jimmie Houchin <jlhouchin@gmail.com> escribió: >>> >>> I have written a micro benchmark which stresses a language in areas which are crucial to my application. >>> >>> I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. >>> >>> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. >>> >>> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( >>> >>> In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. >>> And when I sum the array it gives the correct results. So I can verify its validity. >>> >>> To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. >>> >>> 28800 is simply derived from time series one minute values for 5 days, 4 weeks. >>> >>> randarray := Array new: 28800. >>> >>> 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. >>> >>> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. >>> >>> randarrayttr. "0:00:00:36.135" >>> >>> >>> I do 2 loops with 100 iterations each. >>> >>> randarrayttr * 200. "0:02:00:27" >>> >>> >>> I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. >>> >>> Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? >>> >>> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. >>> >>> >>> However I have played around with several experiments of my #sum: method. >>> >>> This implementation reduces the time on the above randarray in half. >>> >>> sum: col >>> | sum | >>> sum := 0. >>> 1 to: col size do: [ :i | >>> sum := sum + (col at: i) ]. >>> ^ sum >>> >>> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." >>> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. >>> randarrayttr2. "0:00:00:18.563" >>> >>> And this one reduces it a little more. >>> >>> sum10: col >>> | sum | >>> sum := 0. >>> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | >>> sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) >>> + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. >>> ((col size quo: 10) * 10 + 1) to: col size do: [ :i | >>> sum := sum + (col at: i)]. >>> ^ sum >>> >>> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." >>> ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. >>> randarrayttr3. "0:00:00:14.592" >>> >>> It closes the gap with plain Python3 no numpy. But that is a pretty low standard. >>> >>> Any ideas, thoughts, wisdom, directions to pursue. >>> >>> Thanks >>> >>> Jimmie >>>
JH
Jimmie Houchin
Fri, Jan 7, 2022 12:19 PM

As I stated this is a micro benchmark and very much not anything
resembling a real app, Your comments are true if you are writing your
app. But if you want to stress the language you are going to do things
which are seemingly non-sense and abusive.

Also as I stated. The test has to be sufficient to stress faster
languages or it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this
is what we get.

Julia      0.2256 seconds
Python   5.318  seconds
Pharo    3.5    seconds

This test does not sufficiently stress the language. Nor does it provide
any valuable insight into summing and averaging which is done a lot, in
lots of places in every iteration.

If you notice that inner array changes the array every iteration. So
every call to #sum and #average is getting different data.

Full Test

Julia     1.13  minutes
Python   24.02 minutes
Pharo    2:09:04

Code for the above is now published. You can let me know if I am doing
something unequal to the various languages.

And just remember anything you do which sufficiently changes the test
has to be done in all the languages to give a fair test. This isn't a
lets make Pharo look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But
that is not a test of Pharo. The Python is plain Python3 no Numpy, just
using the the default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their
domain and they optimize, optimize, optimize all the math. In fact they
have reached the point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo,
you wouldn't even look. :(

Thanks for exploring this with me.

Jimmie

On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive. Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless. If I remove the #sum and the #average calls from the inner loops, this is what we get. Julia      0.2256 seconds Python   5.318  seconds Pharo    3.5    seconds This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration. If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data. Full Test Julia     1.13  minutes Python   24.02 minutes Pharo    2:09:04 Code for the above is now published. You can let me know if I am doing something unequal to the various languages. And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly. Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array. Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran. In all of this I just want Pharo to do the best it can. With the above results unless you already had an investment in Pharo, you wouldn't even look. :( Thanks for exploring this with me. Jimmie On 1/6/22 18:24, John Brant wrote: > On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: >> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >> >> Number random. "generates a float 0.8188008774329387" >> >> So in the randarray below it is an array of 28800 floats. >> >> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >> >> >> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >> >> >> https://github.com/jlhouchin/LanguageTestPharo >> >> >> Let me know if there is anything else I can do to help solve this problem. >> >> I am a lone developer in my spare time. So my apologies for any ugly code. >> > Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: > > 1 to: nsize do: [ :j || n | > n := narray at: j. > narray at: j put: (self loop1calc: i j: j n: n). > nsum := narray sum. > navg := narray average ] > > As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. > > > John Brant
SV
Sven Van Caekenberghe
Fri, Jan 7, 2022 1:40 PM

Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down).

Then I ran your code with:

[ (LanguageTest newSize: 60245*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800  timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:09:31.338

Which would be twice as fast as Python, if I got the parameters correct.

Sven

On 7 Jan 2022, at 13:19, Jimmie Houchin jlhouchin@gmail.com wrote:

As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive.

Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this is what we get.

Julia      0.2256 seconds
Python  5.318  seconds
Pharo    3.5    seconds

This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration.

If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data.

Full Test

Julia    1.13  minutes
Python  24.02 minutes
Pharo    2:09:04

Code for the above is now published. You can let me know if I am doing something unequal to the various languages.

And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, you wouldn't even look. :(

Thanks for exploring this with me.

Jimmie

On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

Hi Jimmie, I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1 I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down). Then I ran your code with: [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun. which gave me "0:00:09:31.338" The console output was: === Starting test for array size: 28800 iterations: 10 Creating array of size: 28800 timeToRun: 0:00:00:00.031 Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00 Loop 1 time: nil nsum: 11234.235001659388 navg: 0.39007760422428434 Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00 Loop 2 time: 0:00:04:44.593 nsum: 11245.697629561537 navg: 0.3904756121375534 End of test. TotalTime: 0:00:09:31.338 === Which would be twice as fast as Python, if I got the parameters correct. Sven > On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouchin@gmail.com> wrote: > > As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive. > > Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless. > > If I remove the #sum and the #average calls from the inner loops, this is what we get. > > Julia 0.2256 seconds > Python 5.318 seconds > Pharo 3.5 seconds > > This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration. > > If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data. > > Full Test > > Julia 1.13 minutes > Python 24.02 minutes > Pharo 2:09:04 > > Code for the above is now published. You can let me know if I am doing something unequal to the various languages. > > And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly. > > Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array. > > Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran. > > In all of this I just want Pharo to do the best it can. > > With the above results unless you already had an investment in Pharo, you wouldn't even look. :( > > Thanks for exploring this with me. > > > Jimmie > > > > > On 1/6/22 18:24, John Brant wrote: >> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >>> >>> Number random. "generates a float 0.8188008774329387" >>> >>> So in the randarray below it is an array of 28800 floats. >>> >>> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >>> >>> >>> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >>> >>> >>> https://github.com/jlhouchin/LanguageTestPharo >>> >>> >>> Let me know if there is anything else I can do to help solve this problem. >>> >>> I am a lone developer in my spare time. So my apologies for any ugly code. >>> >> Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: >> >> 1 to: nsize do: [ :j || n | >> n := narray at: j. >> narray at: j put: (self loop1calc: i j: j n: n). >> nsum := narray sum. >> navg := narray average ] >> >> As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. >> >> >> John Brant
JH
Jimmie Houchin
Fri, Jan 7, 2022 3:05 PM

Hello Sven,

I went and removed the Stdouts that you mention and other timing code
from the loops.

I am running the test now, to see if that makes much difference. I do
not think it will.

The reason I put that in there is because it take so long to run. It can
be frustrating to wait and wait and not know if your test is doing
anything or not. So I put the code in to let me know.

One of your parameters is incorrect. It is 100 iterations not 10.

I learned early on in this experiment that I have to do a large number
of iterations or C, C++, Java, etc are too fast to have comprehensible
results.

I can tell if any of the implementations is incorrect by the final nsum.
All implementations must produce the same result.

Thanks for the comments.

Jimmie

On 1/7/22 07:40, Sven Van Caekenberghe wrote:

Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down).

Then I ran your code with:

[ (LanguageTest newSize: 60245*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800  timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:09:31.338

Which would be twice as fast as Python, if I got the parameters correct.

Sven

On 7 Jan 2022, at 13:19, Jimmie Houchin jlhouchin@gmail.com wrote:

As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive.

Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this is what we get.

Julia      0.2256 seconds
Python  5.318  seconds
Pharo    3.5    seconds

This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration.

If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data.

Full Test

Julia    1.13  minutes
Python  24.02 minutes
Pharo    2:09:04

Code for the above is now published. You can let me know if I am doing something unequal to the various languages.

And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, you wouldn't even look. :(

Thanks for exploring this with me.

Jimmie

On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
n := narray at: j.
narray at: j put: (self loop1calc: i j: j n: n).
nsum := narray sum.
navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

Hello Sven, I went and removed the Stdouts that you mention and other timing code from the loops. I am running the test now, to see if that makes much difference. I do not think it will. The reason I put that in there is because it take so long to run. It can be frustrating to wait and wait and not know if your test is doing anything or not. So I put the code in to let me know. One of your parameters is incorrect. It is 100 iterations not 10. I learned early on in this experiment that I have to do a large number of iterations or C, C++, Java, etc are too fast to have comprehensible results. I can tell if any of the implementations is incorrect by the final nsum. All implementations must produce the same result. Thanks for the comments. Jimmie On 1/7/22 07:40, Sven Van Caekenberghe wrote: > Hi Jimmie, > > I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1 > > I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down). > > Then I ran your code with: > > [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun. > > which gave me "0:00:09:31.338" > > The console output was: > > === > Starting test for array size: 28800 iterations: 10 > > Creating array of size: 28800 timeToRun: 0:00:00:00.031 > > Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00 > Loop 1 time: nil > nsum: 11234.235001659388 > navg: 0.39007760422428434 > > Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00 > Loop 2 time: 0:00:04:44.593 > nsum: 11245.697629561537 > navg: 0.3904756121375534 > > End of test. TotalTime: 0:00:09:31.338 > === > > Which would be twice as fast as Python, if I got the parameters correct. > > Sven > >> On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouchin@gmail.com> wrote: >> >> As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive. >> >> Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless. >> >> If I remove the #sum and the #average calls from the inner loops, this is what we get. >> >> Julia 0.2256 seconds >> Python 5.318 seconds >> Pharo 3.5 seconds >> >> This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration. >> >> If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data. >> >> Full Test >> >> Julia 1.13 minutes >> Python 24.02 minutes >> Pharo 2:09:04 >> >> Code for the above is now published. You can let me know if I am doing something unequal to the various languages. >> >> And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly. >> >> Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array. >> >> Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran. >> >> In all of this I just want Pharo to do the best it can. >> >> With the above results unless you already had an investment in Pharo, you wouldn't even look. :( >> >> Thanks for exploring this with me. >> >> >> Jimmie >> >> >> >> >> On 1/6/22 18:24, John Brant wrote: >>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>>> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >>>> >>>> Number random. "generates a float 0.8188008774329387" >>>> >>>> So in the randarray below it is an array of 28800 floats. >>>> >>>> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >>>> >>>> >>>> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >>>> >>>> >>>> https://github.com/jlhouchin/LanguageTestPharo >>>> >>>> >>>> Let me know if there is anything else I can do to help solve this problem. >>>> >>>> I am a lone developer in my spare time. So my apologies for any ugly code. >>>> >>> Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: >>> >>> 1 to: nsize do: [ :j || n | >>> n := narray at: j. >>> narray at: j put: (self loop1calc: i j: j n: n). >>> nsum := narray sum. >>> navg := narray average ] >>> >>> As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. >>> >>> >>> John Brant
SV
Sven Van Caekenberghe
Fri, Jan 7, 2022 3:30 PM

On 7 Jan 2022, at 16:05, Jimmie Houchin jlhouchin@gmail.com wrote:

Hello Sven,

I went and removed the Stdouts that you mention and other timing code from the loops.

I am running the test now, to see if that makes much difference. I do not think it will.

The reason I put that in there is because it take so long to run. It can be frustrating to wait and wait and not know if your test is doing anything or not. So I put the code in to let me know.

One of your parameters is incorrect. It is 100 iterations not 10.

Ah, I misread the Python code, on top it says, reps = 10, while at the bottom it does indeed say, doit(100).

So the time should be multiplied by 10.

The logging, esp. the #flush will slow things down. But the removing the message tally spy is important too.

The general implementation of #sum is not optimal in the case of a fixed array. Consider:

data := Array new: 1e5 withAll: 0.5.

[ data sum ] bench. "'494.503 per second'"

[ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. "'680.128 per second'"

[ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum ] bench. "'1033.180 per second'"

As others have remarked: doing #average right after #sum is doing the same thing twice. But maybe that is not the point.

I learned early on in this experiment that I have to do a large number of iterations or C, C++, Java, etc are too fast to have comprehensible results.

I can tell if any of the implementations is incorrect by the final nsum. All implementations must produce the same result.

Thanks for the comments.

Jimmie

On 1/7/22 07:40, Sven Van Caekenberghe wrote:

Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down).

Then I ran your code with:

[ (LanguageTest newSize: 60245*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800  timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:09:31.338

Which would be twice as fast as Python, if I got the parameters correct.

Sven

On 7 Jan 2022, at 13:19, Jimmie Houchin jlhouchin@gmail.com wrote:

As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive.

Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this is what we get.

Julia      0.2256 seconds
Python  5.318  seconds
Pharo    3.5    seconds

This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration.

If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data.

Full Test

Julia    1.13  minutes
Python  24.02 minutes
Pharo    2:09:04

Code for the above is now published. You can let me know if I am doing something unequal to the various languages.

And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, you wouldn't even look. :(

Thanks for exploring this with me.

Jimmie

On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

> On 7 Jan 2022, at 16:05, Jimmie Houchin <jlhouchin@gmail.com> wrote: > > Hello Sven, > > I went and removed the Stdouts that you mention and other timing code from the loops. > > I am running the test now, to see if that makes much difference. I do not think it will. > > The reason I put that in there is because it take so long to run. It can be frustrating to wait and wait and not know if your test is doing anything or not. So I put the code in to let me know. > > One of your parameters is incorrect. It is 100 iterations not 10. Ah, I misread the Python code, on top it says, reps = 10, while at the bottom it does indeed say, doit(100). So the time should be multiplied by 10. The logging, esp. the #flush will slow things down. But the removing the message tally spy is important too. The general implementation of #sum is not optimal in the case of a fixed array. Consider: data := Array new: 1e5 withAll: 0.5. [ data sum ] bench. "'494.503 per second'" [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. "'680.128 per second'" [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum ] bench. "'1033.180 per second'" As others have remarked: doing #average right after #sum is doing the same thing twice. But maybe that is not the point. > I learned early on in this experiment that I have to do a large number of iterations or C, C++, Java, etc are too fast to have comprehensible results. > > I can tell if any of the implementations is incorrect by the final nsum. All implementations must produce the same result. > > Thanks for the comments. > > Jimmie > > > On 1/7/22 07:40, Sven Van Caekenberghe wrote: >> Hi Jimmie, >> >> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1 >> >> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down). >> >> Then I ran your code with: >> >> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun. >> >> which gave me "0:00:09:31.338" >> >> The console output was: >> >> === >> Starting test for array size: 28800 iterations: 10 >> >> Creating array of size: 28800 timeToRun: 0:00:00:00.031 >> >> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00 >> Loop 1 time: nil >> nsum: 11234.235001659388 >> navg: 0.39007760422428434 >> >> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00 >> Loop 2 time: 0:00:04:44.593 >> nsum: 11245.697629561537 >> navg: 0.3904756121375534 >> >> End of test. TotalTime: 0:00:09:31.338 >> === >> >> Which would be twice as fast as Python, if I got the parameters correct. >> >> Sven >> >>> On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>> >>> As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive. >>> >>> Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless. >>> >>> If I remove the #sum and the #average calls from the inner loops, this is what we get. >>> >>> Julia 0.2256 seconds >>> Python 5.318 seconds >>> Pharo 3.5 seconds >>> >>> This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration. >>> >>> If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data. >>> >>> Full Test >>> >>> Julia 1.13 minutes >>> Python 24.02 minutes >>> Pharo 2:09:04 >>> >>> Code for the above is now published. You can let me know if I am doing something unequal to the various languages. >>> >>> And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly. >>> >>> Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array. >>> >>> Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran. >>> >>> In all of this I just want Pharo to do the best it can. >>> >>> With the above results unless you already had an investment in Pharo, you wouldn't even look. :( >>> >>> Thanks for exploring this with me. >>> >>> >>> Jimmie >>> >>> >>> >>> >>> On 1/6/22 18:24, John Brant wrote: >>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>>>> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >>>>> >>>>> Number random. "generates a float 0.8188008774329387" >>>>> >>>>> So in the randarray below it is an array of 28800 floats. >>>>> >>>>> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >>>>> >>>>> >>>>> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >>>>> >>>>> >>>>> https://github.com/jlhouchin/LanguageTestPharo >>>>> >>>>> >>>>> Let me know if there is anything else I can do to help solve this problem. >>>>> >>>>> I am a lone developer in my spare time. So my apologies for any ugly code. >>>>> >>>> Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: >>>> >>>> 1 to: nsize do: [ :j || n | >>>> n := narray at: j. >>>> narray at: j put: (self loop1calc: i j: j n: n). >>>> nsum := narray sum. >>>> navg := narray average ] >>>> >>>> As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. >>>> >>>> >>>> John Brant