BS

Benoit St-Jean

Fri, Jan 7, 2022 3:36 PM

Can you come up with a simple "base case" so we can find the bottleneck/problem?
I'm not sure about what you're trying to do.
What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items).
Let's get this one step at a time!

| floatArray n rng t1 t2 t3 r1 r2 r3 |
n := 10000000.
rng := Random new.
floatArray := Array new: n. floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].
t1 := Time millisecondsToRun: [r1 := floatArray sum].t2 := Time millisecondsToRun: [| total | total := 0. floatArray do: [:each | total := total + each ]. r2 := total]. t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]].
Transcript cr.Transcript cr; show: 'Test with ', n printString, ' elements'.Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString.Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString. Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3
Test with 10000000 elementsOriginal #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero". (A. Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <jlhouchin@gmail.com> wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
     sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
         + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
     sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Can you come up with a simple "base case" so we can find the bottleneck/problem? I'm not sure about what you're trying to do. What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items). Let's get this one step at a time! | floatArray n rng t1 t2 t3 r1 r2 r3 | n := 10000000. rng := Random new. floatArray := Array new: n. floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next]. t1 := Time millisecondsToRun: [r1 := floatArray sum].t2 := Time millisecondsToRun: [| total | total := 0. floatArray do: [:each | total := total + each ]. r2 := total]. t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]]. Transcript cr.Transcript cr; show: 'Test with ', n printString, ' elements'.Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString.Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString. Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString. -------------------------- Here are the results I get on Squeak 5.3 Test with 10000000 elementsOriginal #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth GitHub: bstjean Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein) On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <jlhouchin@gmail.com> wrote: I have written a micro benchmark which stresses a language in areas which are crucial to my application. I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, C, C++, Java and Julia. On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing magic they have done. Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo takes over 2 hours. :( In my benchmarks if I comment out the sum and average of the array. It completes in 3.5 seconds. And when I sum the array it gives the correct results. So I can verify its validity. To illustrate below is some sample code of what I am doing. I iterate over the array and do calculations on each value of the array and update the array and sum and average at each value simple to stress array access and sum and average. 28800 is simply derived from time series one minute values for 5 days, 4 weeks. randarray := Array new: 28800. 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." randarray sum. randarray average ]] timeToRun. randarrayttr. "0:00:00:36.135" I do 2 loops with 100 iterations each. randarrayttr * 200. "0:02:00:27" I learned early on in this adventure when dealing with compiled languages that if you don’t do a lot, the test may not last long enough to give any times. Pharo is my preference. But this is an awful big gap in performance. When doing backtesting this is huge. Does my backtest take minutes, hours or days? I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not know if there is anything which can improve this. However I have played around with several experiments of my #sum: method. This implementation reduces the time on the above randarray in half. sum: col | sum | sum := 0. 1 to: col size do: [ :i | sum := sum + (col at: i) ]. ^ sum randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here." ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. randarrayttr2. "0:00:00:18.563" And this one reduces it a little more. sum10: col | sum | sum := 0. 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col at: (i + 3)) + (col at: (i + 4)) + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col at: (i + 8)) + (col at: (i + 9))]. ((col size quo: 10) * 10 + 1) to: col size do: [ :i | sum := sum + (col at: i)]. ^ sum randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here." ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. randarrayttr3. "0:00:00:14.592" It closes the gap with plain Python3 no numpy. But that is a pretty low standard. Any ideas, thoughts, wisdom, directions to pursue. Thanks Jimmie

SV

Sven Van Caekenberghe

Fri, Jan 7, 2022 6:52 PM

Hi Jimmy,

I made a couple more changes:

I added

SequenceableCollection>>#sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

as an extension method. It is not 100% semantically the same as the original, but it works for our case here. this also optimises #average BTW. This is the main one.

I tried to avoid a couple of integer -> float conversions in

normalize: n
| nn |

nn := n = 0
	ifTrue: [ 0.000123456789 ] 
	ifFalse: [ n asFloat ].

[ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ].
[ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ].

^ nn

Avoided one assignment in

loop1calc: i j: j n: n
| v |
v := n * (i+n) * (j-n) * 0.1234567.
^ self normalize: (vvv)

the time for 10 iterations now is halved:

===
Starting test for array size: 28800 iterations: 10

Creating array of size: 28800 timeToRun: 0:00:00:00.002

Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00
Loop 1 time: nil
nsum: 11234.235001659386
navg: 0.3900776042242842

Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00
Loop 2 time: 0:00:02:28.017
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test. TotalTime: 0:00:04:57.733

Sven

On 7 Jan 2022, at 16:30, Sven Van Caekenberghe sven@stfx.eu wrote:

On 7 Jan 2022, at 16:05, Jimmie Houchin jlhouchin@gmail.com wrote:

Hello Sven,

I went and removed the Stdouts that you mention and other timing code from the loops.

I am running the test now, to see if that makes much difference. I do not think it will.

The reason I put that in there is because it take so long to run. It can be frustrating to wait and wait and not know if your test is doing anything or not. So I put the code in to let me know.

One of your parameters is incorrect. It is 100 iterations not 10.

Ah, I misread the Python code, on top it says, reps = 10, while at the bottom it does indeed say, doit(100).

So the time should be multiplied by 10.

The logging, esp. the #flush will slow things down. But the removing the message tally spy is important too.

The general implementation of #sum is not optimal in the case of a fixed array. Consider:

data := Array new: 1e5 withAll: 0.5.

[ data sum ] bench. "'494.503 per second'"

[ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. "'680.128 per second'"

[ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum ] bench. "'1033.180 per second'"

As others have remarked: doing #average right after #sum is doing the same thing twice. But maybe that is not the point.

I learned early on in this experiment that I have to do a large number of iterations or C, C++, Java, etc are too fast to have comprehensible results.

I can tell if any of the implementations is incorrect by the final nsum. All implementations must produce the same result.

Thanks for the comments.

Jimmie

On 1/7/22 07:40, Sven Van Caekenberghe wrote:

Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down).

Then I ran your code with:

[ (LanguageTest newSize: 60245*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800 iterations: 10

Creating array of size: 28800 timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test. TotalTime: 0:00:09:31.338

Which would be twice as fast as Python, if I got the parameters correct.

Sven

On 7 Jan 2022, at 13:19, Jimmie Houchin jlhouchin@gmail.com wrote:

As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive.

Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this is what we get.

Julia 0.2256 seconds
Python 5.318 seconds
Pharo 3.5 seconds

This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration.

If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data.

Full Test

Julia 1.13 minutes
Python 24.02 minutes
Pharo 2:09:04

Code for the above is now published. You can let me know if I am doing something unequal to the various languages.

And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, you wouldn't even look. :(

Thanks for exploring this with me.

Jimmie

On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin jlhouchin@gmail.com wrote:

No, it is an array of floats. The only integers in the test are in the indexes of the loops.

Number random. "generates a float 0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But...

I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app.

https://github.com/jlhouchin/LanguageTestPharo

Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.

Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
	n := narray at: j.
	narray at: j put: (self loop1calc: i j: j n: n).
	nsum := narray sum.
	navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing.

John Brant

Hi Jimmy, I made a couple more changes: - I added SequenceableCollection>>#sum | sum | sum := 0. 1 to: self size do: [ :each | sum := sum + (self at: each) ]. ^ sum as an extension method. It is not 100% semantically the same as the original, but it works for our case here. this also optimises #average BTW. This is the main one. - I tried to avoid a couple of integer -> float conversions in normalize: n | nn | nn := n = 0 ifTrue: [ 0.000123456789 ] ifFalse: [ n asFloat ]. [ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ]. [ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ]. ^ nn - Avoided one assignment in loop1calc: i j: j n: n | v | v := n * (i+n) * (j-n) * 0.1234567. ^ self normalize: (v*v*v) the time for 10 iterations now is halved: === Starting test for array size: 28800 iterations: 10 Creating array of size: 28800 timeToRun: 0:00:00:00.002 Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00 Loop 1 time: nil nsum: 11234.235001659386 navg: 0.3900776042242842 Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00 Loop 2 time: 0:00:02:28.017 nsum: 11245.697629561537 navg: 0.3904756121375534 End of test. TotalTime: 0:00:04:57.733 === Sven > On 7 Jan 2022, at 16:30, Sven Van Caekenberghe <sven@stfx.eu> wrote: > > > >> On 7 Jan 2022, at 16:05, Jimmie Houchin <jlhouchin@gmail.com> wrote: >> >> Hello Sven, >> >> I went and removed the Stdouts that you mention and other timing code from the loops. >> >> I am running the test now, to see if that makes much difference. I do not think it will. >> >> The reason I put that in there is because it take so long to run. It can be frustrating to wait and wait and not know if your test is doing anything or not. So I put the code in to let me know. >> >> One of your parameters is incorrect. It is 100 iterations not 10. > > Ah, I misread the Python code, on top it says, reps = 10, while at the bottom it does indeed say, doit(100). > > So the time should be multiplied by 10. > > The logging, esp. the #flush will slow things down. But the removing the message tally spy is important too. > > The general implementation of #sum is not optimal in the case of a fixed array. Consider: > > data := Array new: 1e5 withAll: 0.5. > > [ data sum ] bench. "'494.503 per second'" > > [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. "'680.128 per second'" > > [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum ] bench. "'1033.180 per second'" > > As others have remarked: doing #average right after #sum is doing the same thing twice. But maybe that is not the point. > >> I learned early on in this experiment that I have to do a large number of iterations or C, C++, Java, etc are too fast to have comprehensible results. >> >> I can tell if any of the implementations is incorrect by the final nsum. All implementations must produce the same result. >> >> Thanks for the comments. >> >> Jimmie >> >> >> On 1/7/22 07:40, Sven Van Caekenberghe wrote: >>> Hi Jimmie, >>> >>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1 >>> >>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not done in Python either) as well as the MessageTally spyOn: from #run (slows things down). >>> >>> Then I ran your code with: >>> >>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun. >>> >>> which gave me "0:00:09:31.338" >>> >>> The console output was: >>> >>> === >>> Starting test for array size: 28800 iterations: 10 >>> >>> Creating array of size: 28800 timeToRun: 0:00:00:00.031 >>> >>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00 >>> Loop 1 time: nil >>> nsum: 11234.235001659388 >>> navg: 0.39007760422428434 >>> >>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00 >>> Loop 2 time: 0:00:04:44.593 >>> nsum: 11245.697629561537 >>> navg: 0.3904756121375534 >>> >>> End of test. TotalTime: 0:00:09:31.338 >>> === >>> >>> Which would be twice as fast as Python, if I got the parameters correct. >>> >>> Sven >>> >>>> On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>>> >>>> As I stated this is a micro benchmark and very much not anything resembling a real app, Your comments are true if you are writing your app. But if you want to stress the language you are going to do things which are seemingly non-sense and abusive. >>>> >>>> Also as I stated. The test has to be sufficient to stress faster languages or it is meaningless. >>>> >>>> If I remove the #sum and the #average calls from the inner loops, this is what we get. >>>> >>>> Julia 0.2256 seconds >>>> Python 5.318 seconds >>>> Pharo 3.5 seconds >>>> >>>> This test does not sufficiently stress the language. Nor does it provide any valuable insight into summing and averaging which is done a lot, in lots of places in every iteration. >>>> >>>> If you notice that inner array changes the array every iteration. So every call to #sum and #average is getting different data. >>>> >>>> Full Test >>>> >>>> Julia 1.13 minutes >>>> Python 24.02 minutes >>>> Pharo 2:09:04 >>>> >>>> Code for the above is now published. You can let me know if I am doing something unequal to the various languages. >>>> >>>> And just remember anything you do which sufficiently changes the test has to be done in all the languages to give a fair test. This isn't a lets make Pharo look good test. I do want Pharo to look good, but honestly. >>>> >>>> Yes, I know that I can bind to BLAS or other external libraries. But that is not a test of Pharo. The Python is plain Python3 no Numpy, just using the the default list [] for the array. >>>> >>>> Julia is a whole other world. It is faster than Numpy. This is their domain and they optimize, optimize, optimize all the math. In fact they have reached the point that some pure Julia code beats pure Fortran. >>>> >>>> In all of this I just want Pharo to do the best it can. >>>> >>>> With the above results unless you already had an investment in Pharo, you wouldn't even look. :( >>>> >>>> Thanks for exploring this with me. >>>> >>>> >>>> Jimmie >>>> >>>> >>>> >>>> >>>> On 1/6/22 18:24, John Brant wrote: >>>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouchin@gmail.com> wrote: >>>>>> No, it is an array of floats. The only integers in the test are in the indexes of the loops. >>>>>> >>>>>> Number random. "generates a float 0.8188008774329387" >>>>>> >>>>>> So in the randarray below it is an array of 28800 floats. >>>>>> >>>>>> It just felt so wrong to me that Python3 was so much faster. I don't care if Nim, Crystal, Julia are faster. But... >>>>>> >>>>>> >>>>>> I am new to Iceberg and have never shared anything on Github so this is all new to me. I uploaded my language test so you can see what it does. It is a micro-benchmark. It does things that are not realistic in an app. But it does stress a language in areas important to my app. >>>>>> >>>>>> >>>>>> https://github.com/jlhouchin/LanguageTestPharo >>>>>> >>>>>> >>>>>> Let me know if there is anything else I can do to help solve this problem. >>>>>> >>>>>> I am a lone developer in my spare time. So my apologies for any ugly code. >>>>>> >>>>> Are you sure that you have the same algorithm in Python? You are calling sum and average inside the loop where you are modifying the array: >>>>> >>>>> 1 to: nsize do: [ :j || n | >>>>> n := narray at: j. >>>>> narray at: j put: (self loop1calc: i j: j n: n). >>>>> nsum := narray sum. >>>>> navg := narray average ] >>>>> >>>>> As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus another 28,800 times for the average). If I write a similar loop in Python, it looks like it would take almost 9 minutes on my machine without using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, then I would change it to not call sum twice (once for sum and once in average). This will almost result in a 2x speedup. You could also modify the algorithm to update the nsum value in the loop instead of summing the array each time. I think the updating would require <120,000 math ops vs the >1.6 billion that you are performing. >>>>> >>>>> >>>>> John Brant

SD

stephane ducasse

Sat, Jan 8, 2022 8:47 PM

Thanks benoit for the snippet
I run it in Pharo 10 and I got

Test with 10000000 elements
Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6

in Pharo 9
Test with 10000000 elements
Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6

I’m interested to understand why Pharo is slower. May be this is the impact
of the new full blocks.
We started to play with the idea of regression benchmarks.

S

On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev pharo-dev@lists.pharo.org wrote:

Can you come up with a simple "base case" so we can find the bottleneck/problem?

I'm not sure about what you're trying to do.

What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items).

Let's get this one step at a time!

| floatArray n rng t1 t2 t3 r1 r2 r3 |

n := 10000000.

rng := Random new.

floatArray := Array new: n.
floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].

t1 := Time millisecondsToRun: [r1 := floatArray sum].
t2 := Time millisecondsToRun: [| total |

							total := 0.
							floatArray do: [:each | total := total + each ].
							r2 := total].

t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]].

Transcript cr.
Transcript cr; show: 'Test with ', n printString, ' elements'.
Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString.
Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString.
Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3

Test with 10000000 elements
Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero". (A. Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin jlhouchin@gmail.com wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Thanks benoit for the snippet I run it in Pharo 10 and I got Test with 10000000 elements Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6 Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6 Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6 in Pharo 9 Test with 10000000 elements Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6 Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6 Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6 I’m interested to understand why Pharo is slower. May be this is the impact of the new full blocks. We started to play with the idea of regression benchmarks. S > On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev <pharo-dev@lists.pharo.org> wrote: > > Can you come up with a simple "base case" so we can find the bottleneck/problem? > > I'm not sure about what you're trying to do. > > What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items). > > Let's get this one step at a time! > > > > | floatArray n rng t1 t2 t3 r1 r2 r3 | > > n := 10000000. > > rng := Random new. > > floatArray := Array new: n. > floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next]. > > t1 := Time millisecondsToRun: [r1 := floatArray sum]. > t2 := Time millisecondsToRun: [| total | > > total := 0. > floatArray do: [:each | total := total + each ]. > r2 := total]. > > t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]]. > > Transcript cr. > Transcript cr; show: 'Test with ', n printString, ' elements'. > Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString. > Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString. > Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString. > > -------------------------- > > Here are the results I get on Squeak 5.3 > > Test with 10000000 elements > Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6 > Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6 > Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 > > > > ----------------- > Benoît St-Jean > Yahoo! Messenger: bstjean > Twitter: @BenLeChialeux > Pinterest: benoitstjean > Instagram: Chef_Benito > IRC: lamneth > GitHub: bstjean > Blogue: endormitoire.wordpress.com > "A standpoint is an intellectual horizon of radius zero". (A. Einstein) > > > On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <jlhouchin@gmail.com> wrote: > > > I have written a micro benchmark which stresses a language in areas > which are crucial to my application. > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, > PicoLisp, C, C++, Java and Julia. > > On my i7 laptop Julia completes it in about 1 minute and 15 seconds, > amazing magic they have done. > > Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. > Pharo takes over 2 hours. :( > > In my benchmarks if I comment out the sum and average of the array. It > completes in 3.5 seconds. > And when I sum the array it gives the correct results. So I can verify > its validity. > > To illustrate below is some sample code of what I am doing. I iterate > over the array and do calculations on each value of the array and update > the array and sum and average at each value simple to stress array > access and sum and average. > > 28800 is simply derived from time series one minute values for 5 days, 4 > weeks. > > randarray := Array new: 28800. > > 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. > > randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations > here." randarray sum. randarray average ]] timeToRun. > > randarrayttr. "0:00:00:36.135" > > > I do 2 loops with 100 iterations each. > > randarrayttr * 200. "0:02:00:27" > > > I learned early on in this adventure when dealing with compiled > languages that if you don’t do a lot, the test may not last long enough > to give any times. > > Pharo is my preference. But this is an awful big gap in performance. > When doing backtesting this is huge. Does my backtest take minutes, > hours or days? > > I am not a computer scientist nor expert in Pharo or Smalltalk. So I do > not know if there is anything which can improve this. > > > However I have played around with several experiments of my #sum: method. > > This implementation reduces the time on the above randarray in half. > > sum: col > | sum | > sum := 0. > 1 to: col size do: [ :i | > sum := sum + (col at: i) ]. > ^ sum > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > randarrayttr2. "0:00:00:18.563" > > And this one reduces it a little more. > > sum10: col > | sum | > sum := 0. > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + > (col at: (i + 3)) + (col at: (i + 4)) > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + > (col at: (i + 8)) + (col at: (i + 9))]. > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > sum := sum + (col at: i)]. > ^ sum > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > randarrayttr3. "0:00:00:14.592" > > It closes the gap with plain Python3 no numpy. But that is a pretty low > standard. > > Any ideas, thoughts, wisdom, directions to pursue. > > Thanks > > Jimmie >

SD

Stéphane Ducasse

Sun, Jan 9, 2022 9:14 AM

On my machine so this is the same.

SQ5.3

Test with 10000000 elements
Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6
Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6
Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6

On 8 Jan 2022, at 21:47, stephane ducasse stephane.ducasse@inria.fr wrote:

Thanks benoit for the snippet
I run it in Pharo 10 and I got

Test with 10000000 elements
Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6

in Pharo 9
Test with 10000000 elements
Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6

I’m interested to understand why Pharo is slower. May be this is the impact
of the new full blocks.
We started to play with the idea of regression benchmarks.

S

On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev <pharo-dev@lists.pharo.org mailto:pharo-dev@lists.pharo.org> wrote:

Can you come up with a simple "base case" so we can find the bottleneck/problem?

I'm not sure about what you're trying to do.

What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items).

Let's get this one step at a time!

| floatArray n rng t1 t2 t3 r1 r2 r3 |

n := 10000000.

rng := Random new.

floatArray := Array new: n.
floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].

t1 := Time millisecondsToRun: [r1 := floatArray sum].
t2 := Time millisecondsToRun: [| total |

							total := 0.
							floatArray do: [:each | total := total + each ].
							r2 := total].

t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]].

Transcript cr.
Transcript cr; show: 'Test with ', n printString, ' elements'.
Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString.
Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString.
Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3

Test with 10000000 elements
Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com http://endormitoire.wordpress.com/
"A standpoint is an intellectual horizon of radius zero". (A. Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <jlhouchin@gmail.com mailto:jlhouchin@gmail.com> wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

On my machine so this is the same. SQ5.3 Test with 10000000 elements Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6 Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6 Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6 > On 8 Jan 2022, at 21:47, stephane ducasse <stephane.ducasse@inria.fr> wrote: > > Thanks benoit for the snippet > I run it in Pharo 10 and I got > > Test with 10000000 elements > Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6 > Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6 > Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6 > > > in Pharo 9 > Test with 10000000 elements > Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6 > Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6 > Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6 > > I’m interested to understand why Pharo is slower. May be this is the impact > of the new full blocks. > We started to play with the idea of regression benchmarks. > > S > > >> On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev <pharo-dev@lists.pharo.org <mailto:pharo-dev@lists.pharo.org>> wrote: >> >> Can you come up with a simple "base case" so we can find the bottleneck/problem? >> >> I'm not sure about what you're trying to do. >> >> What do you get if you try this in a workspace (adjust the value of n to what you want, I tested it with 10 million items). >> >> Let's get this one step at a time! >> >> >> >> | floatArray n rng t1 t2 t3 r1 r2 r3 | >> >> n := 10000000. >> >> rng := Random new. >> >> floatArray := Array new: n. >> floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next]. >> >> t1 := Time millisecondsToRun: [r1 := floatArray sum]. >> t2 := Time millisecondsToRun: [| total | >> >> total := 0. >> floatArray do: [:each | total := total + each ]. >> r2 := total]. >> >> t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total :each | total + each ]]. >> >> Transcript cr. >> Transcript cr; show: 'Test with ', n printString, ' elements'. >> Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' milliseconds, Total: ', r1 printString. >> Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, Total: ', r2 printString. >> Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 printString. >> >> -------------------------- >> >> Here are the results I get on Squeak 5.3 >> >> Test with 10000000 elements >> Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6 >> Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6 >> Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 >> >> >> >> ----------------- >> Benoît St-Jean >> Yahoo! Messenger: bstjean >> Twitter: @BenLeChialeux >> Pinterest: benoitstjean >> Instagram: Chef_Benito >> IRC: lamneth >> GitHub: bstjean >> Blogue: endormitoire.wordpress.com <http://endormitoire.wordpress.com/> >> "A standpoint is an intellectual horizon of radius zero". (A. Einstein) >> >> >> On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <jlhouchin@gmail.com <mailto:jlhouchin@gmail.com>> wrote: >> >> >> I have written a micro benchmark which stresses a language in areas >> which are crucial to my application. >> >> I have written this micro benchmark in Pharo, Crystal, Nim, Python, >> PicoLisp, C, C++, Java and Julia. >> >> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, >> amazing magic they have done. >> >> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. >> Pharo takes over 2 hours. :( >> >> In my benchmarks if I comment out the sum and average of the array. It >> completes in 3.5 seconds. >> And when I sum the array it gives the correct results. So I can verify >> its validity. >> >> To illustrate below is some sample code of what I am doing. I iterate >> over the array and do calculations on each value of the array and update >> the array and sum and average at each value simple to stress array >> access and sum and average. >> >> 28800 is simply derived from time series one minute values for 5 days, 4 >> weeks. >> >> randarray := Array new: 28800. >> >> 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. >> >> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations >> here." randarray sum. randarray average ]] timeToRun. >> >> randarrayttr. "0:00:00:36.135" >> >> >> I do 2 loops with 100 iterations each. >> >> randarrayttr * 200. "0:02:00:27" >> >> >> I learned early on in this adventure when dealing with compiled >> languages that if you don’t do a lot, the test may not last long enough >> to give any times. >> >> Pharo is my preference. But this is an awful big gap in performance. >> When doing backtesting this is huge. Does my backtest take minutes, >> hours or days? >> >> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do >> not know if there is anything which can improve this. >> >> >> However I have played around with several experiments of my #sum: method. >> >> This implementation reduces the time on the above randarray in half. >> >> sum: col >> | sum | >> sum := 0. >> 1 to: col size do: [ :i | >> sum := sum + (col at: i) ]. >> ^ sum >> >> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations >> here." >> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. >> randarrayttr2. "0:00:00:18.563" >> >> And this one reduces it a little more. >> >> sum10: col >> | sum | >> sum := 0. >> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + >> (col at: (i + 3)) + (col at: (i + 4)) >> + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + >> (col at: (i + 8)) + (col at: (i + 9))]. >> ((col size quo: 10) * 10 + 1) to: col size do: [ :i | >> sum := sum + (col at: i)]. >> ^ sum >> >> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations >> here." >> ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. >> randarrayttr3. "0:00:00:14.592" >> >> It closes the gap with plain Python3 no numpy. But that is a pretty low >> standard. >> >> Any ideas, thoughts, wisdom, directions to pursue. >> >> Thanks >> >> Jimmie >> >

GP

Guillermo Polito

Sun, Jan 9, 2022 9:53 AM

Yet, be careful, that way of benchmarking will have a lot of variation and
noise. Remember there is an OS, other apps open, even the CPU getting
hot/cold can introduce performance differences...

At least, that snippet should be run so many times (I do 100 iterations in
general), and the averages should be compared taking into account standard
deviation.

El dom., 9 ene. 2022 10:14, Stéphane Ducasse stephane.ducasse@inria.fr
escribió:

On my machine so this is the same.

SQ5.3

Test with 10000000 elements
Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6
Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6
Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6

On 8 Jan 2022, at 21:47, stephane ducasse stephane.ducasse@inria.fr
wrote:

Thanks benoit for the snippet
I run it in Pharo 10 and I got

Test with 10000000 elements
Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6

in Pharo 9
Test with 10000000 elements
Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6

I’m interested to understand why Pharo is slower. May be this is the
impact
of the new full blocks.
We started to play with the idea of regression benchmarks.

S

On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev <
pharo-dev@lists.pharo.org> wrote:

Can you come up with a simple "base case" so we can find the
bottleneck/problem?

I'm not sure about what you're trying to do.

What do you get if you try this in a workspace (adjust the value of n to
what you want, I tested it with 10 million items).

Let's get this one step at a time!

| floatArray n rng t1 t2 t3 r1 r2 r3 |

n := 10000000.

rng := Random new.

floatArray := Array new: n.
floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].

t1 := Time millisecondsToRun: [r1 := floatArray sum].
t2 := Time millisecondsToRun: [| total |
total := 0.
floatArray do: [:each | total := total + each ].
r2 := total].
t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total
:each | total + each ]].

Transcript cr.
Transcript cr; show: 'Test with ', n printString, ' elements'.
Transcript cr;show: 'Original #sum -> Time: ', t1 printString, '
milliseconds, Total: ', r1 printString.
Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, '
milliseconds, Total: ', r2 printString.
Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, '
milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3

Test with 10000000 elements
Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero". (A. Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin <
jlhouchin@gmail.com> wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes.
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

To illustrate below is some sample code of what I am doing. I iterate
over the array and do calculations on each value of the array and update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) +
(col at: (i + 3)) + (col at: (i + 4))
+ (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Yet, be careful, that way of benchmarking will have a lot of variation and noise. Remember there is an OS, other apps open, even the CPU getting hot/cold can introduce performance differences... At least, that snippet should be run so many times (I do 100 iterations in general), and the averages should be compared taking into account standard deviation. El dom., 9 ene. 2022 10:14, Stéphane Ducasse <stephane.ducasse@inria.fr> escribió: > On my machine so this is the same. > > SQ5.3 > > Test with 10000000 elements > Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6 > Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6 > Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6 > > > > On 8 Jan 2022, at 21:47, stephane ducasse <stephane.ducasse@inria.fr> > wrote: > > Thanks benoit for the snippet > I run it in Pharo 10 and I got > > Test with 10000000 elements > Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6 > Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6 > Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6 > > > in Pharo 9 > Test with 10000000 elements > Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6 > Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6 > Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6 > > I’m interested to understand why Pharo is slower. May be this is the > impact > of the new full blocks. > We started to play with the idea of regression benchmarks. > > S > > > On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev < > pharo-dev@lists.pharo.org> wrote: > > Can you come up with a simple "base case" so we can find the > bottleneck/problem? > > I'm not sure about what you're trying to do. > > What do you get if you try this in a workspace (adjust the value of n to > what you want, I tested it with 10 million items). > > Let's get this one step at a time! > > > > | floatArray n rng t1 t2 t3 r1 r2 r3 | > > n := 10000000. > > rng := Random new. > > floatArray := Array new: n. > floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next]. > > t1 := Time millisecondsToRun: [r1 := floatArray sum]. > t2 := Time millisecondsToRun: [| total | > total := 0. > floatArray do: [:each | total := total + each ]. > r2 := total]. > t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: total > :each | total + each ]]. > > Transcript cr. > Transcript cr; show: 'Test with ', n printString, ' elements'. > Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' > milliseconds, Total: ', r1 printString. > Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' > milliseconds, Total: ', r2 printString. > Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' > milliseconds, Total: ', r3 printString. > > -------------------------- > > Here are the results I get on Squeak 5.3 > > Test with 10000000 elements > Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6 > Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6 > Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 > > > > ----------------- > Benoît St-Jean > Yahoo! Messenger: bstjean > Twitter: @BenLeChialeux > Pinterest: benoitstjean > Instagram: Chef_Benito > IRC: lamneth > GitHub: bstjean > Blogue: endormitoire.wordpress.com > "A standpoint is an intellectual horizon of radius zero". (A. Einstein) > > > On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin < > jlhouchin@gmail.com> wrote: > > > I have written a micro benchmark which stresses a language in areas > which are crucial to my application. > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, > PicoLisp, C, C++, Java and Julia. > > On my i7 laptop Julia completes it in about 1 minute and 15 seconds, > amazing magic they have done. > > Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. > Pharo takes over 2 hours. :( > > In my benchmarks if I comment out the sum and average of the array. It > completes in 3.5 seconds. > And when I sum the array it gives the correct results. So I can verify > its validity. > > To illustrate below is some sample code of what I am doing. I iterate > over the array and do calculations on each value of the array and update > the array and sum and average at each value simple to stress array > access and sum and average. > > 28800 is simply derived from time series one minute values for 5 days, 4 > weeks. > > randarray := Array new: 28800. > > 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. > > randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations > here." randarray sum. randarray average ]] timeToRun. > > randarrayttr. "0:00:00:36.135" > > > I do 2 loops with 100 iterations each. > > randarrayttr * 200. "0:02:00:27" > > > I learned early on in this adventure when dealing with compiled > languages that if you don’t do a lot, the test may not last long enough > to give any times. > > Pharo is my preference. But this is an awful big gap in performance. > When doing backtesting this is huge. Does my backtest take minutes, > hours or days? > > I am not a computer scientist nor expert in Pharo or Smalltalk. So I do > not know if there is anything which can improve this. > > > However I have played around with several experiments of my #sum: method. > > This implementation reduces the time on the above randarray in half. > > sum: col > | sum | > sum := 0. > 1 to: col size do: [ :i | > sum := sum + (col at: i) ]. > ^ sum > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > randarrayttr2. "0:00:00:18.563" > > And this one reduces it a little more. > > sum10: col > | sum | > sum := 0. > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + > (col at: (i + 3)) + (col at: (i + 4)) > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + > (col at: (i + 8)) + (col at: (i + 9))]. > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > sum := sum + (col at: i)]. > ^ sum > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > randarrayttr3. "0:00:00:14.592" > > It closes the gap with plain Python3 no numpy. But that is a pretty low > standard. > > Any ideas, thoughts, wisdom, directions to pursue. > > Thanks > > Jimmie > > > >

NA

Nicolas Anquetil

Sun, Jan 9, 2022 11:05 AM

Definitly not easy to do benchmarking
I got these strange results:

n := 10000000.
floatArray := Array new: n.

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: Random new ] ].
"-> 2871"

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: i ] ].
"-> 86"

Time millisecondsToRun: [1 to: n do: [:i | Random new ]].
"-> 829"

so

assigning 'Random new' to 1M array elements takes 2.8 seconds.
assigning a value to 1M array elements takes 0.08 seconds.
computing 'Random new' 1M times takes 0.8 seconds

I wonder where the extra 2 seconds come from?
some optimization in the background?

I did the 3 of them several times in different order and the results
are similar.

nicolas

On Fri, 2022-01-07 at 15:36 +0000, Benoit St-Jean via Pharo-dev wrote:

Can you come up with a simple "base case" so we can find the
bottleneck/problem?

I'm not sure about what you're trying to do.

What do you get if you try this in a workspace (adjust the value of n
to what you want, I tested it with 10 million items).

Let's get this one step at a time!

| floatArray n rng t1 t2 t3 r1 r2 r3 |

n := 10000000.

rng := Random new.

floatArray := Array new: n.
floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng
next].

t1 := Time millisecondsToRun: [r1 := floatArray sum].
t2 := Time millisecondsToRun: [| total |

							total

:= 0.
floatA
rray do: [:each | total := total + each ].
r2 :=
total].

t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [:
total :each | total + each ]].

Transcript cr.
Transcript cr; show: 'Test with ', n printString, ' elements'.
Transcript cr;show: 'Original #sum -> Time: ', t1 printString, '
milliseconds, Total: ', r1 printString.
Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, '
milliseconds, Total: ', r2 printString.
Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, '
milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3

Test with 10000000 elements
Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero". (A.
Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin
jlhouchin@gmail.com wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds,
amazing magic they have done.

Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify
its validity.

over the array and do calculations on each value of the array and
update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5 days,
4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long enough
to give any times.

Pharo is my preference. But this is an awful big gap in performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum:
method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
      sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2))
+
(col at: (i + 3)) + (col at: (i + 4))
          + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
      sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

Definitly not easy to do benchmarking I got these strange results: n := 10000000. floatArray := Array new: n. Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx | floatArray at: idx put: Random new ] ]. "-> 2871" Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx | floatArray at: idx put: i ] ]. "-> 86" Time millisecondsToRun: [1 to: n do: [:i | Random new ]]. "-> 829" so - assigning 'Random new' to 1M array elements takes 2.8 seconds. - assigning a value to 1M array elements takes 0.08 seconds. - computing 'Random new' 1M times takes 0.8 seconds I wonder where the extra 2 seconds come from? some optimization in the background? I did the 3 of them several times in different order and the results are similar. nicolas On Fri, 2022-01-07 at 15:36 +0000, Benoit St-Jean via Pharo-dev wrote: > Can you come up with a simple "base case" so we can find the > bottleneck/problem? > > I'm not sure about what you're trying to do. > > What do you get if you try this in a workspace (adjust the value of n > to what you want, I tested it with 10 million items). > > Let's get this one step at a time! > > > > | floatArray n rng t1 t2 t3 r1 r2 r3 | > > n := 10000000. > > rng := Random new. > > floatArray := Array new: n. > floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng > next]. > > t1 := Time millisecondsToRun: [r1 := floatArray sum]. > t2 := Time millisecondsToRun: [| total | > > total > := 0. > floatA > rray do: [:each | total := total + each ]. > r2 := > total]. > > t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: > total :each | total + each ]]. > > Transcript cr. > Transcript cr; show: 'Test with ', n printString, ' elements'. > Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' > milliseconds, Total: ', r1 printString. > Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' > milliseconds, Total: ', r2 printString. > Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' > milliseconds, Total: ', r3 printString. > > -------------------------- > > Here are the results I get on Squeak 5.3 > > Test with 10000000 elements > Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6 > Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6 > Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 > > > > ----------------- > Benoît St-Jean > Yahoo! Messenger: bstjean > Twitter: @BenLeChialeux > Pinterest: benoitstjean > Instagram: Chef_Benito > IRC: lamneth > GitHub: bstjean > Blogue: endormitoire.wordpress.com > "A standpoint is an intellectual horizon of radius zero". (A. > Einstein) > > > On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin > <jlhouchin@gmail.com> wrote: > > > I have written a micro benchmark which stresses a language in areas > which are crucial to my application. > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, > PicoLisp, C, C++, Java and Julia. > > On my i7 laptop Julia completes it in about 1 minute and 15 seconds, > amazing magic they have done. > > Pharo takes over 2 hours. :( > > In my benchmarks if I comment out the sum and average of the array. It > completes in 3.5 seconds. > And when I sum the array it gives the correct results. So I can verify > its validity. > > over the array and do calculations on each value of the array and > update > the array and sum and average at each value simple to stress array > access and sum and average. > > 28800 is simply derived from time series one minute values for 5 days, > 4 > weeks. > > randarray := Array new: 28800. > > 1 to: randarray size do: [ :i | randarray at: i put: Number random ]. > > here." randarray sum. randarray average ]] timeToRun. > > randarrayttr. "0:00:00:36.135" > > > I do 2 loops with 100 iterations each. > > randarrayttr * 200. "0:02:00:27" > > > I learned early on in this adventure when dealing with compiled > languages that if you don’t do a lot, the test may not last long enough > to give any times. > > Pharo is my preference. But this is an awful big gap in performance. > When doing backtesting this is huge. Does my backtest take minutes, > hours or days? > > I am not a computer scientist nor expert in Pharo or Smalltalk. So I do > not know if there is anything which can improve this. > > > However I have played around with several experiments of my #sum: > method. > > This implementation reduces the time on the above randarray in half. > > sum: col > | sum | > sum := 0. > 1 to: col size do: [ :i | > sum := sum + (col at: i) ]. > ^ sum > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > randarrayttr2. "0:00:00:18.563" > > And this one reduces it a little more. > > sum10: col > | sum | > sum := 0. > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) > + > (col at: (i + 3)) + (col at: (i + 4)) > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + > (col at: (i + 8)) + (col at: (i + 9))]. > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > sum := sum + (col at: i)]. > ^ sum > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations > here." > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > randarrayttr3. "0:00:00:14.592" > > It closes the gap with plain Python3 no numpy. But that is a pretty low > standard. > > Any ideas, thoughts, wisdom, directions to pursue. > > Thanks > > Jimmie >

NA

Nicolas Anquetil

Sun, Jan 9, 2022 11:08 AM

On Sun, 2022-01-09 at 12:05 +0100, Nicolas Anquetil wrote:

Definitly not easy to do benchmarking
I got these strange results:

n := 10000000.
floatArray := Array new: n.

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: Random new ] ].
"-> 2871"

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: i ] ].

ooops, that was 'floatArray at: idx put: idx'
(-> similar time)

"-> 86"

Time millisecondsToRun: [1 to: n do: [:i | Random new ]].
"-> 829"

so

assigning 'Random new' to 1M array elements takes 2.8 seconds.
assigning a value to 1M array elements takes 0.08 seconds.
computing 'Random new' 1M times takes 0.8 seconds

I wonder where the extra 2 seconds come from?
some optimization in the background?

I did the 3 of them several times in different order and the results
are similar.

nicolas

On Fri, 2022-01-07 at 15:36 +0000, Benoit St-Jean via Pharo-dev
wrote:

Can you come up with a simple "base case" so we can find the
bottleneck/problem?

I'm not sure about what you're trying to do.

What do you get if you try this in a workspace (adjust the value of
n
to what you want, I tested it with 10 million items).

Let's get this one step at a time!

floatArray n rng t1 t2 t3 r1 r2 r3 |

n := 10000000.

rng := Random new.

floatArray := Array new: n.
floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng
next].

t1 := Time millisecondsToRun: [r1 := floatArray sum].
t2 := Time millisecondsToRun: [| total |

                                                                tot
al
:= 0.
                                                                flo
atA
rray do: [:each | total := total + each ].
                                                                r2
:=
total].

t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [:
total :each | total + each ]].

Transcript cr.
Transcript cr; show: 'Test with ', n printString, ' elements'.
Transcript cr;show: 'Original #sum -> Time: ', t1 printString, '
milliseconds, Total: ', r1 printString.
Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, '
milliseconds, Total: ', r2 printString.
Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, '
milliseconds, Total: ', r3 printString.

Here are the results I get on Squeak 5.3

Test with 10000000 elements
Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6

Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero". (A.
Einstein)

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin
jlhouchin@gmail.com wrote:

I have written a micro benchmark which stresses a language in areas
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python,
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15
seconds,
amazing magic they have done.

Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array.
It
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can
verify
its validity.

over the array and do calculations on each value of the array and
update
the array and sum and average at each value simple to stress array
access and sum and average.

28800 is simply derived from time series one minute values for 5
days,
4
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random
].

here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"

I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"

I learned early on in this adventure when dealing with compiled
languages that if you don’t do a lot, the test may not last long
enough
to give any times.

Pharo is my preference. But this is an awful big gap in
performance.
When doing backtesting this is huge. Does my backtest take minutes,
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So
I do
not know if there is anything which can improve this.

However I have played around with several experiments of my #sum:
method.

This implementation reduces the time on the above randarray in
half.

sum: col

sum |

sum := 0.
1 to: col size do: [ :i |
sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other
calculations
here."
ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col

sum |

sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
      sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i +
2))
+
(col at: (i + 3)) + (col at: (i + 4))
          + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i +
7)) +
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
      sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other
calculations
here."
ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty
low
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

On Sun, 2022-01-09 at 12:05 +0100, Nicolas Anquetil wrote: > > Definitly not easy to do benchmarking > I got these strange results: > > n := 10000000. > floatArray := Array new: n. > > Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx | > floatArray at: idx put: Random new ] ]. > "-> 2871" > > Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx | > floatArray at: idx put: i ] ]. ooops, that was 'floatArray at: idx put: idx' (-> similar time) > "-> 86" > > Time millisecondsToRun: [1 to: n do: [:i | Random new ]]. > "-> 829" > > so > - assigning 'Random new' to 1M array elements takes 2.8 seconds. > - assigning a value to 1M array elements takes 0.08 seconds. > - computing 'Random new' 1M times takes 0.8 seconds > > > I wonder where the extra 2 seconds come from? > some optimization in the background? > > I did the 3 of them several times in different order and the results > are similar. > > nicolas > > On Fri, 2022-01-07 at 15:36 +0000, Benoit St-Jean via Pharo-dev > wrote: > > Can you come up with a simple "base case" so we can find the > > bottleneck/problem? > > > > I'm not sure about what you're trying to do. > > > > What do you get if you try this in a workspace (adjust the value of > > n > > to what you want, I tested it with 10 million items). > > > > Let's get this one step at a time! > > > > > > > > > floatArray n rng t1 t2 t3 r1 r2 r3 | > > > > n := 10000000. > > > > rng := Random new. > > > > floatArray := Array new: n. > > floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng > > next]. > > > > t1 := Time millisecondsToRun: [r1 := floatArray sum]. > > t2 := Time millisecondsToRun: [| total | > > > > tot > > al > > := 0. > > flo > > atA > > rray do: [:each | total := total + each ]. > > r2 > > := > > total]. > > > > t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into: [: > > total :each | total + each ]]. > > > > Transcript cr. > > Transcript cr; show: 'Test with ', n printString, ' elements'. > > Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' > > milliseconds, Total: ', r1 printString. > > Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' > > milliseconds, Total: ', r2 printString. > > Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' > > milliseconds, Total: ', r3 printString. > > > > -------------------------- > > > > Here are the results I get on Squeak 5.3 > > > > Test with 10000000 elements > > Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6 > > Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6 > > Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6 > > > > > > > > ----------------- > > Benoît St-Jean > > Yahoo! Messenger: bstjean > > Twitter: @BenLeChialeux > > Pinterest: benoitstjean > > Instagram: Chef_Benito > > IRC: lamneth > > GitHub: bstjean > > Blogue: endormitoire.wordpress.com > > "A standpoint is an intellectual horizon of radius zero". (A. > > Einstein) > > > > > > On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin > > <jlhouchin@gmail.com> wrote: > > > > > > I have written a micro benchmark which stresses a language in areas > > which are crucial to my application. > > > > I have written this micro benchmark in Pharo, Crystal, Nim, Python, > > PicoLisp, C, C++, Java and Julia. > > > > On my i7 laptop Julia completes it in about 1 minute and 15 > > seconds, > > amazing magic they have done. > > > > Pharo takes over 2 hours. :( > > > > In my benchmarks if I comment out the sum and average of the array. > > It > > completes in 3.5 seconds. > > And when I sum the array it gives the correct results. So I can > > verify > > its validity. > > > > over the array and do calculations on each value of the array and > > update > > the array and sum and average at each value simple to stress array > > access and sum and average. > > > > 28800 is simply derived from time series one minute values for 5 > > days, > > 4 > > weeks. > > > > randarray := Array new: 28800. > > > > 1 to: randarray size do: [ :i | randarray at: i put: Number random > > ]. > > > > here." randarray sum. randarray average ]] timeToRun. > > > > randarrayttr. "0:00:00:36.135" > > > > > > I do 2 loops with 100 iterations each. > > > > randarrayttr * 200. "0:02:00:27" > > > > > > I learned early on in this adventure when dealing with compiled > > languages that if you don’t do a lot, the test may not last long > > enough > > to give any times. > > > > Pharo is my preference. But this is an awful big gap in > > performance. > > When doing backtesting this is huge. Does my backtest take minutes, > > hours or days? > > > > I am not a computer scientist nor expert in Pharo or Smalltalk. So > > I do > > not know if there is anything which can improve this. > > > > > > However I have played around with several experiments of my #sum: > > method. > > > > This implementation reduces the time on the above randarray in > > half. > > > > sum: col > > > sum | > > sum := 0. > > 1 to: col size do: [ :i | > > sum := sum + (col at: i) ]. > > ^ sum > > > > randarrayttr2 := [ 1 to: randarray size do: [ :i | "other > > calculations > > here." > > ltsa sum: randarray. ltsa sum: randarray ]] timeToRun. > > randarrayttr2. "0:00:00:18.563" > > > > And this one reduces it a little more. > > > > sum10: col > > > sum | > > sum := 0. > > 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i | > > sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + > > 2)) > > + > > (col at: (i + 3)) + (col at: (i + 4)) > > + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + > > 7)) + > > (col at: (i + 8)) + (col at: (i + 9))]. > > ((col size quo: 10) * 10 + 1) to: col size do: [ :i | > > sum := sum + (col at: i)]. > > ^ sum > > > > randarrayttr3 := [ 1 to: randarray size do: [ :i | "other > > calculations > > here." > > ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun. > > randarrayttr3. "0:00:00:14.592" > > > > It closes the gap with plain Python3 no numpy. But that is a pretty > > low > > standard. > > > > Any ideas, thoughts, wisdom, directions to pursue. > > > > Thanks > > > > Jimmie > > >

JH

Jimmie Houchin

Mon, Jan 10, 2022 8:05 PM

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can
compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a
couple of things. Is the language sufficiently performant on basic
maths. I am not doing any high PolyMath level math. Simple things like
moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is
the best test of this attribute. #sum iterates and accesses every
element of the array. It will reveal if there are any problems.

The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test
in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is
or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's
default #sum time of 02:04:03

/"This implementation does no work. Only iterates through the array.//
//It completed in 00:10:08"/
sum
    | sum |
   sum := 1.
    1 to: self size do: [ :each | ].
    ^ sum

/"This implementation does no work, but adds to iteration, accessing the
value of the array.//
//It completed in 00:32:32.//
//Quite a bit of time for simply iterating and accessing."/
sum
    | sum |
    sum := 1.
    1 to: self size do: [ :each | self at: each ].
    ^ sum

/"This implementation I had in my initial email as an experiment and
also several other did the same in theirs.//
//A naive simple implementation.//
//It completed in 01:00:53. Half the time of the original."/
sum
   | sum |
    sum := 0.
    1 to: self size do: [ :each |
        sum := sum + (self at: each) ].
    ^ sum

/"This implementation I also had in my initial email as an experiment I
had done.//
//It completed in 00:50:18.//
//It reduces the iterations and increases the accesses per iteration.//
//It is the fastest implementation so far."/
sum
    | sum |
    sum := 0.
    1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
        sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i +
2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) +
(self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at:
(i + 9))].

    ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
        sum := sum + (self at: i)].
    ^ sum

*Summary
*

For whatever reason iterating and accessing on an Array is expensive.
That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me
from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not
like Python at all. Julia is unexciting to me. I don't like their
anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is
where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a
problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with
the skills to discover and optimize arrays. So I will end my tilting at
windmills here.

I value all the other things that Pharo brings, that I miss when I am
using Julia or Python or Crystal, etc. Those languages do not have the
vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside
all else.
In that regard I went ahead and put my money in with my decision and
joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

Some experiments and discoveries. I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. So I decided to breakdown the #sum and try some things. Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 /"This implementation does no work. Only iterates through the array.// //It completed in 00:10:08"/ sum | sum | sum := 1. 1 to: self size do: [ :each | ]. ^ sum /"This implementation does no work, but adds to iteration, accessing the value of the array.// //It completed in 00:32:32.// //Quite a bit of time for simply iterating and accessing."/ sum | sum | sum := 1. 1 to: self size do: [ :each | self at: each ]. ^ sum /"This implementation I had in my initial email as an experiment and also several other did the same in theirs.// //A naive simple implementation.// //It completed in 01:00:53. Half the time of the original."/ sum | sum | sum := 0. 1 to: self size do: [ :each | sum := sum + (self at: each) ]. ^ sum /"This implementation I also had in my initial email as an experiment I had done.// //It completed in 00:50:18.// //It reduces the iterations and increases the accesses per iteration.// //It is the fastest implementation so far."/ sum | sum | sum := 0. 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. ((self size quo: 10) * 10 + 1) to: self size do: [ :i | sum := sum + (self at: i)]. ^ sum *Summary * For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. That said, I have made the decision to go all in with Pharo. Set aside all else. In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. Thanks for all of your help in exploring the problem. Jimmie Houchin

AC

Andrei Chis

Tue, Jan 11, 2022 9:07 AM

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53. Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

 ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
     sum := sum + (self at: i)].
   ^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

Hi Jimmie, I was scanning through this thread and saw that the Python call uses the sum function. If I remember correctly, in Python the built-in sum function is directly implemented in C [1] (unless Python is compiled with SLOW_SUM set to true). In that case on large arrays the function can easily be several times faster than just iterating over the individual objects as the Pharo code does. The benchmark seems to compare summing numbers in C with summing numbers in Pharo. Would be interesting to modify the Python code to use a similar loop as in Pharo for doing the sum. Cheers, Andrei [1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461 On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: > > Some experiments and discoveries. > > I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. > > The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. > > The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. > > The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. > > When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. > > So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. > > > > So I decided to breakdown the #sum and try some things. > > Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 > > > "This implementation does no work. Only iterates through the array. > It completed in 00:10:08" > sum > | sum | > sum := 1. > 1 to: self size do: [ :each | ]. > ^ sum > > > "This implementation does no work, but adds to iteration, accessing the value of the array. > It completed in 00:32:32. > Quite a bit of time for simply iterating and accessing." > sum > | sum | > sum := 1. > 1 to: self size do: [ :each | self at: each ]. > ^ sum > > > "This implementation I had in my initial email as an experiment and also several other did the same in theirs. > A naive simple implementation. > It completed in 01:00:53. Half the time of the original." > sum > | sum | > sum := 0. > 1 to: self size do: [ :each | > sum := sum + (self at: each) ]. > ^ sum > > > > "This implementation I also had in my initial email as an experiment I had done. > It completed in 00:50:18. > It reduces the iterations and increases the accesses per iteration. > It is the fastest implementation so far." > sum > | sum | > sum := 0. > 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | > sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. > > ((self size quo: 10) * 10 + 1) to: self size do: [ :i | > sum := sum + (self at: i)]. > ^ sum > > Summary > > For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. > > I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. > > I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. > > At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. > > That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. > > I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. > > I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. > > Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. > > That said, I have made the decision to go all in with Pharo. Set aside all else. > In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. > > Thanks for all of your help in exploring the problem. > > > Jimmie Houchin

SV

Sven Van Caekenberghe

Tue, Jan 11, 2022 10:17 AM

Hi Andrei,

That is a good catch, indeed, that makes all the difference and is an unfair comparison.

If I take Jimmie's code and add

def sum2(l):
sum = 0
for i in range(0,len(l)):
sum = sum + l[i]
return sum

def average(l):
return sum2(l)/len(l)

and replace the other calls of sum to sum2 in loop1 and loop2, I get the following for 1 iteration:

doit(1)

Tue Jan 11 10:34:24 2022
Creating list
createList(n), na[-1]: 0.28800000000000003
reps: 1
inside at top loop1: start: Tue Jan 11 10:34:24 2022
Loop1 time: 1.5645889163017273
nsum: 11242.949400371168
navg: 0.3903801875128878
loop2: start: Tue Jan 11 10:35:58 2022
Loop2 time: -27364895.977849767
nsum: 10816.16871440453
navg: 0.3755614136946017
finished: Tue Jan 11 10:37:33 2022
start time: 1641893664.795651
end time: 1641893853.597397
total time: 1614528959.1841362
nsum: 10816.16871440453
navg: 0.3755614136946017

The total time is calculated wrongly, but doing the calculation in Pharo:

(1641893853.597397 - 1641893664.795651) seconds. "0:00:03:08.80174613"

so 3 minutes.

Jimmie's unmodified Pharo code give for 1 iteration:

[ (LanguageTest newSize: 60245*4 iterations: 1) run ] timeToRun. "0:00:01:00.438"

Starting test for array size: 28800 iterations: 1

Creating array of size: 28800 timeToRun: 0:00:00:00.035

Starting loop 1 at: 2022-01-11T10:53:53.423313+01:00
1: 2022-01-11T10:53:53 innerttr: 0:00:00:30.073 averageTime: 0:00:00:30.073
Loop 1 time: nil
nsum: 11242.949400371168
navg: 0.3903801875128878

Starting loop 2 at: 2022-01-11T10:54:23.497281+01:00
1: 2022-01-11T10:54:23 innerttr: 0:00:00:30.306 averageTime: 0:00:00:30.306
Loop 2 time: 0:00:00:30.306
nsum: 10816.168714404532
navg: 0.3755614136946018

End of test. TotalTime: 0:00:01:00.416

which would seem to be 3 times faster !

Benchmarking is a black art.

Sven

On 11 Jan 2022, at 10:07, Andrei Chis chisvasileandrei@gmail.com wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53. Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

((self size quo: 10) * 10 + 1) to: self size do: [ :i |
    sum := sum + (self at: i)].
  ^ sum