Array sum. is very slow

SV
Sven Van Caekenberghe
Tue, Jan 11, 2022 10:37 AM

On 11 Jan 2022, at 11:17, Sven Van Caekenberghe sven@stfx.eu wrote:

which would seem to be 3 times faster !

And with my changes (faster #sum, message spy removed):

[ (LanguageTest newSize: 60245*4 iterations: 1) run ] timeToRun. "0:00:00:26.612"

6 times faster.

> On 11 Jan 2022, at 11:17, Sven Van Caekenberghe <sven@stfx.eu> wrote: > > which would seem to be 3 times faster ! And with my changes (faster #sum, message spy removed): [ (LanguageTest newSize: 60*24*5*4 iterations: 1) run ] timeToRun. "0:00:00:26.612" 6 times faster.
NA
Nicolas Anquetil
Tue, Jan 11, 2022 10:49 AM

Hi,

don'tforget to weight in your time too.

The ease to develop AND evolve a program is an important aspect that
the benchmarks don't show.

Nowdays, developer time count often more than processing time because
you may easily spent days on a nasty bug or an unplanned evolution.

have a nice day

nicolas

On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote:

Some experiments and discoveries.
I am running my full language test every time. It is the only way I
can compare results. It is also what fully stresses the language.
The reason I wrote the test as I did is because I wanted to know a
couple of things. Is the language sufficiently performant on basic
maths. I am not doing any high PolyMath level math. Simple things
like moving averages over portions of arrays.
The other is efficiency of array iteration and access. This why #sum
is the best test of this attribute. #sum iterates and accesses every
element of the array. It will reveal if there are any problems.
The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour
4minutes.
When I comment out the #sum and #average calls, Pharo completes the
test in 3.5 seconds. So almost all the time is spent in those two
calls.
So most of this conversation has focused on why #sum is as slow as it
is or how to improve the performance of #sum with other
implementations.

 
So I decided to breakdown the #sum and try some things.
Starting with the initial implementation and SequenceableCollection's
default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
 sum
     | sum |
      sum := 1.
     1 to: self size do: [ :each | ].
     ^ sum
 
 
 "This implementation does no work, but adds to iteration, accessing
the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
 sum
     | sum |
     sum := 1.
     1 to: self size do: [ :each | self at: each ].
     ^ sum
 
 
 "This implementation I had in my initial email as an experiment and
also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
 sum
    | sum |
     sum := 0.
     1 to: self size do: [ :each |
         sum := sum + (self at: each) ].
     ^ sum
 
 
 
 "This implementation I also had in my initial email as an experiment
I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
 sum
     | sum |
     sum := 0.
     1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
         sum := sum + (self at: i) + (self at: (i + 1)) + (self at:
(i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              +
(self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self
at: (i + 8)) + (self at: (i + 9))].
 
     ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
         sum := sum + (self at: i)].
       ^ sum
 
Summary
For whatever reason iterating and accessing on an Array is expensive.
That alone took longer than Python to complete the entire test.
 
 I had allowed this knowledge of how much slower Pharo was to stop me
from using Pharo. Encouraged me to explore other options.
 
 I have the option to use any language I want. I like Pharo. I do not
like Python at all. Julia is unexciting to me. I don't like their
anti-OO approach.
 
 At one point I had a fairly complete Pharo implementation, which is
where I got frustrated with backtesting taking days.
 
 That implementation is gone. I had not switched to Iceberg. I had a
problem with my hard drive. So I am starting over.
I am not a computer scientist, language expert, vm expert or anyone
with the skills to discover and optimize arrays. So I will end my
tilting at windmills here.
I value all the other things that Pharo brings, that I miss when I am
using Julia or Python or Crystal, etc. Those languages do not have
the vision to do what Pharo (or any Smalltalk) does.
Pharo may not optimize my app as much as x,y or z. But Pharo
optimized me.
That said, I have made the decision to go all in with Pharo. Set
aside all else.
 In that regard I went ahead and put my money in with my decision and
joined the Pharo Association last week.
Thanks for all of your help in exploring the problem.

Jimmie Houchin
 

Hi, don'tforget to weight in your time too. The ease to develop AND evolve a program is an important aspect that the benchmarks don't show. Nowdays, developer time count often more than processing time because you may easily spent days on a nasty bug or an unplanned evolution. have a nice day nicolas On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote: > Some experiments and discoveries. > I am running my full language test every time. It is the only way I > can compare results. It is also what fully stresses the language. > The reason I wrote the test as I did is because I wanted to know a > couple of things. Is the language sufficiently performant on basic > maths. I am not doing any high PolyMath level math. Simple things > like moving averages over portions of arrays. > The other is efficiency of array iteration and access. This why #sum > is the best test of this attribute. #sum iterates and accesses every > element of the array. It will reveal if there are any problems. > The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour > 4minutes. > When I comment out the #sum and #average calls, Pharo completes the > test in 3.5 seconds. So almost all the time is spent in those two > calls. > So most of this conversation has focused on why #sum is as slow as it > is or how to improve the performance of #sum with other > implementations. > >   > So I decided to breakdown the #sum and try some things. > Starting with the initial implementation and SequenceableCollection's > default #sum  time of 02:04:03 > > "This implementation does no work. Only iterates through the array. > It completed in 00:10:08" >  sum >      | sum | >       sum := 1. >      1 to: self size do: [ :each | ]. >      ^ sum >   >   >  "This implementation does no work, but adds to iteration, accessing > the value of the array. > It completed in 00:32:32. > Quite a bit of time for simply iterating and accessing." >  sum >      | sum | >      sum := 1. >      1 to: self size do: [ :each | self at: each ]. >      ^ sum >   >   >  "This implementation I had in my initial email as an experiment and > also several other did the same in theirs. > A naive simple implementation. > It completed in 01:00:53.  Half the time of the original." >  sum >     | sum | >      sum := 0. >      1 to: self size do: [ :each | >          sum := sum + (self at: each) ]. >      ^ sum >   >   >   >  "This implementation I also had in my initial email as an experiment > I had done. > It completed in 00:50:18. > It reduces the iterations and increases the accesses per iteration. > It is the fastest implementation so far." >  sum >      | sum | >      sum := 0. >      1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >          sum := sum + (self at: i) + (self at: (i + 1)) + (self at: > (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + > (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self > at: (i + 8)) + (self at: (i + 9))]. >   >      ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >          sum := sum + (self at: i)]. >        ^ sum >   > Summary > For whatever reason iterating and accessing on an Array is expensive. > That alone took longer than Python to complete the entire test. >   >  I had allowed this knowledge of how much slower Pharo was to stop me > from using Pharo. Encouraged me to explore other options. >   >  I have the option to use any language I want. I like Pharo. I do not > like Python at all. Julia is unexciting to me. I don't like their > anti-OO approach. >   >  At one point I had a fairly complete Pharo implementation, which is > where I got frustrated with backtesting taking days. >   >  That implementation is gone. I had not switched to Iceberg. I had a > problem with my hard drive. So I am starting over. > I am not a computer scientist, language expert, vm expert or anyone > with the skills to discover and optimize arrays. So I will end my > tilting at windmills here. > I value all the other things that Pharo brings, that I miss when I am > using Julia or Python or Crystal, etc. Those languages do not have > the vision to do what Pharo (or any Smalltalk) does. > Pharo may not optimize my app as much as x,y or z. But Pharo > optimized me. > That said, I have made the decision to go all in with Pharo. Set > aside all else. >  In that regard I went ahead and put my money in with my decision and > joined the Pharo Association last week. > Thanks for all of your help in exploring the problem. > > Jimmie Houchin >  
JH
Jimmie Houchin
Tue, Jan 11, 2022 2:36 PM

Personally I am okay with Python implementing in C. That is their
implementation detail. It does not impose anything on the user other
than knowing normal Python. It isn't cheating or unfair. They are under
no obligation to handicap themselves so that we can be more comparable.

Are we going to put such requirements on C, C++, Julia, Crystal, Nim?

I expect every language to put forth its best. I would like the same for
Pharo. And let the numbers fall where they may.

Jimmie

On 1/11/22 03:07, Andrei Chis wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

 ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
     sum := sum + (self at: i)].
   ^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

Personally I am okay with Python implementing in C. That is their implementation detail. It does not impose anything on the user other than knowing normal Python. It isn't cheating or unfair. They are under no obligation to handicap themselves so that we can be more comparable. Are we going to put such requirements on C, C++, Julia, Crystal, Nim? I expect every language to put forth its best. I would like the same for Pharo. And let the numbers fall where they may. Jimmie On 1/11/22 03:07, Andrei Chis wrote: > Hi Jimmie, > > I was scanning through this thread and saw that the Python call uses > the sum function. If I remember correctly, in Python the built-in sum > function is directly implemented in C [1] (unless Python is compiled > with SLOW_SUM set to true). In that case on large arrays the function > can easily be several times faster than just iterating over the > individual objects as the Pharo code does. The benchmark seems to > compare summing numbers in C with summing numbers in Pharo. Would be > interesting to modify the Python code to use a similar loop as in > Pharo for doing the sum. > > Cheers, > Andrei > > [1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461 > > On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: >> Some experiments and discoveries. >> >> I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. >> >> The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. >> >> The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. >> >> The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. >> >> When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. >> >> So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. >> >> >> >> So I decided to breakdown the #sum and try some things. >> >> Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 >> >> >> "This implementation does no work. Only iterates through the array. >> It completed in 00:10:08" >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | ]. >> ^ sum >> >> >> "This implementation does no work, but adds to iteration, accessing the value of the array. >> It completed in 00:32:32. >> Quite a bit of time for simply iterating and accessing." >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | self at: each ]. >> ^ sum >> >> >> "This implementation I had in my initial email as an experiment and also several other did the same in theirs. >> A naive simple implementation. >> It completed in 01:00:53. Half the time of the original." >> sum >> | sum | >> sum := 0. >> 1 to: self size do: [ :each | >> sum := sum + (self at: each) ]. >> ^ sum >> >> >> >> "This implementation I also had in my initial email as an experiment I had done. >> It completed in 00:50:18. >> It reduces the iterations and increases the accesses per iteration. >> It is the fastest implementation so far." >> sum >> | sum | >> sum := 0. >> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. >> >> ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >> sum := sum + (self at: i)]. >> ^ sum >> >> Summary >> >> For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. >> >> I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. >> >> I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. >> >> At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. >> >> That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. >> >> I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. >> >> I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. >> >> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. >> >> That said, I have made the decision to go all in with Pharo. Set aside all else. >> In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. >> >> Thanks for all of your help in exploring the problem. >> >> >> Jimmie Houchin
JH
Jimmie Houchin
Tue, Jan 11, 2022 7:07 PM

Thanks for the comments.

They are very true.

Jimmie

On 1/11/22 04:49, Nicolas Anquetil wrote:

Hi,

don'tforget to weight in your time too.

The ease to develop AND evolve a program is an important aspect that
the benchmarks don't show.

Nowdays, developer time count often more than processing time because
you may easily spent days on a nasty bug or an unplanned evolution.

have a nice day

nicolas

On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote:

Some experiments and discoveries.
I am running my full language test every time. It is the only way I
can compare results. It is also what fully stresses the language.
The reason I wrote the test as I did is because I wanted to know a
couple of things. Is the language sufficiently performant on basic
maths. I am not doing any high PolyMath level math. Simple things
like moving averages over portions of arrays.
The other is efficiency of array iteration and access. This why #sum
is the best test of this attribute. #sum iterates and accesses every
element of the array. It will reveal if there are any problems.
The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour
4minutes.
When I comment out the #sum and #average calls, Pharo completes the
test in 3.5 seconds. So almost all the time is spent in those two
calls.
So most of this conversation has focused on why #sum is as slow as it
is or how to improve the performance of #sum with other
implementations.

So I decided to breakdown the #sum and try some things.
Starting with the initial implementation and SequenceableCollection's
default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
 sum
     | sum |
      sum := 1.
     1 to: self size do: [ :each | ].
     ^ sum

 "This implementation does no work, but adds to iteration, accessing
the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
 sum
     | sum |
     sum := 1.
     1 to: self size do: [ :each | self at: each ].
     ^ sum

 "This implementation I had in my initial email as an experiment and
also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
 sum
    | sum |
     sum := 0.
     1 to: self size do: [ :each |
         sum := sum + (self at: each) ].
     ^ sum

 "This implementation I also had in my initial email as an experiment
I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
 sum
     | sum |
     sum := 0.
     1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
         sum := sum + (self at: i) + (self at: (i + 1)) + (self at:
(i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              +
(self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self
at: (i + 8)) + (self at: (i + 9))].

     ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
         sum := sum + (self at: i)].
       ^ sum

Summary
For whatever reason iterating and accessing on an Array is expensive.
That alone took longer than Python to complete the entire test.

 I had allowed this knowledge of how much slower Pharo was to stop me
from using Pharo. Encouraged me to explore other options.

 I have the option to use any language I want. I like Pharo. I do not
like Python at all. Julia is unexciting to me. I don't like their
anti-OO approach.

 At one point I had a fairly complete Pharo implementation, which is
where I got frustrated with backtesting taking days.

 That implementation is gone. I had not switched to Iceberg. I had a
problem with my hard drive. So I am starting over.
I am not a computer scientist, language expert, vm expert or anyone
with the skills to discover and optimize arrays. So I will end my
tilting at windmills here.
I value all the other things that Pharo brings, that I miss when I am
using Julia or Python or Crystal, etc. Those languages do not have
the vision to do what Pharo (or any Smalltalk) does.
Pharo may not optimize my app as much as x,y or z. But Pharo
optimized me.
That said, I have made the decision to go all in with Pharo. Set
aside all else.
 In that regard I went ahead and put my money in with my decision and
joined the Pharo Association last week.
Thanks for all of your help in exploring the problem.

Jimmie Houchin

Thanks for the comments. They are very true. Jimmie On 1/11/22 04:49, Nicolas Anquetil wrote: > Hi, > > don'tforget to weight in your time too. > > The ease to develop AND evolve a program is an important aspect that > the benchmarks don't show. > > Nowdays, developer time count often more than processing time because > you may easily spent days on a nasty bug or an unplanned evolution. > > have a nice day > > nicolas > > On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote: >> Some experiments and discoveries. >> I am running my full language test every time. It is the only way I >> can compare results. It is also what fully stresses the language. >> The reason I wrote the test as I did is because I wanted to know a >> couple of things. Is the language sufficiently performant on basic >> maths. I am not doing any high PolyMath level math. Simple things >> like moving averages over portions of arrays. >> The other is efficiency of array iteration and access. This why #sum >> is the best test of this attribute. #sum iterates and accesses every >> element of the array. It will reveal if there are any problems. >> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour >> 4minutes. >> When I comment out the #sum and #average calls, Pharo completes the >> test in 3.5 seconds. So almost all the time is spent in those two >> calls. >> So most of this conversation has focused on why #sum is as slow as it >> is or how to improve the performance of #sum with other >> implementations. >> >> >> So I decided to breakdown the #sum and try some things. >> Starting with the initial implementation and SequenceableCollection's >> default #sum  time of 02:04:03 >> >> "This implementation does no work. Only iterates through the array. >> It completed in 00:10:08" >>  sum >>      | sum | >>       sum := 1. >>      1 to: self size do: [ :each | ]. >>      ^ sum >> >> >>  "This implementation does no work, but adds to iteration, accessing >> the value of the array. >> It completed in 00:32:32. >> Quite a bit of time for simply iterating and accessing." >>  sum >>      | sum | >>      sum := 1. >>      1 to: self size do: [ :each | self at: each ]. >>      ^ sum >> >> >>  "This implementation I had in my initial email as an experiment and >> also several other did the same in theirs. >> A naive simple implementation. >> It completed in 01:00:53.  Half the time of the original." >>  sum >>     | sum | >>      sum := 0. >>      1 to: self size do: [ :each | >>          sum := sum + (self at: each) ]. >>      ^ sum >> >> >> >>  "This implementation I also had in my initial email as an experiment >> I had done. >> It completed in 00:50:18. >> It reduces the iterations and increases the accesses per iteration. >> It is the fastest implementation so far." >>  sum >>      | sum | >>      sum := 0. >>      1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >>          sum := sum + (self at: i) + (self at: (i + 1)) + (self at: >> (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + >> (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self >> at: (i + 8)) + (self at: (i + 9))]. >> >>      ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >>          sum := sum + (self at: i)]. >>        ^ sum >> >> Summary >> For whatever reason iterating and accessing on an Array is expensive. >> That alone took longer than Python to complete the entire test. >> >>  I had allowed this knowledge of how much slower Pharo was to stop me >> from using Pharo. Encouraged me to explore other options. >> >>  I have the option to use any language I want. I like Pharo. I do not >> like Python at all. Julia is unexciting to me. I don't like their >> anti-OO approach. >> >>  At one point I had a fairly complete Pharo implementation, which is >> where I got frustrated with backtesting taking days. >> >>  That implementation is gone. I had not switched to Iceberg. I had a >> problem with my hard drive. So I am starting over. >> I am not a computer scientist, language expert, vm expert or anyone >> with the skills to discover and optimize arrays. So I will end my >> tilting at windmills here. >> I value all the other things that Pharo brings, that I miss when I am >> using Julia or Python or Crystal, etc. Those languages do not have >> the vision to do what Pharo (or any Smalltalk) does. >> Pharo may not optimize my app as much as x,y or z. But Pharo >> optimized me. >> That said, I have made the decision to go all in with Pharo. Set >> aside all else. >>  In that regard I went ahead and put my money in with my decision and >> joined the Pharo Association last week. >> Thanks for all of your help in exploring the problem. >> >> Jimmie Houchin >>
MR
Miloslav.Raus@cuzk.cz
Tue, Jan 11, 2022 7:53 PM

Hi, ppl.

I kinda agree with both sides of the argument.
Whilst taken one way it is comparing apples to oranges, its deeply beneficial to optimize the obvious/"idiomatic" cases - especially if you can [without introducing friction / special cases].
- ifTrue: and/or ifFalse anyone ?

Aaaand the language /runtime/environment should be evaluated on the grounds of how it handles the "idiomatic cases" -- unless you wanna diverge into the territory of "how much assembly [or its hi-level equiv.] is too much optimization".

No minus points for python here. But no way they can do sane reloading while keeping current semantics ...

It's all a trade-off, and the only clean winners overall are Smalltalk & [Common] Lisp, IMNSHO.
- biased, but happy; in denial, also (?) - mostly paid for working with other laguages/runtimes :-/

Cheers,

M.R.

-----Original Message-----
From: Jimmie Houchin jlhouchin@gmail.com
Sent: Tuesday, January 11, 2022 3:37 PM
To: pharo-dev@lists.pharo.org
Subject: [Pharo-dev] Re: Array sum. is very slow

Personally I am okay with Python implementing in C. That is their implementation detail. It does not impose anything on the user other than knowing normal Python. It isn't cheating or unfair. They are under no obligation to handicap themselves so that we can be more comparable.

Are we going to put such requirements on C, C++, Julia, Crystal, Nim?

I expect every language to put forth its best. I would like the same for Pharo. And let the numbers fall where they may.

Jimmie

On 1/11/22 03:07, Andrei Chis wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1]
https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675
cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's
default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

  ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
      sum := sum + (self at: i)].
    ^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

Hi, ppl. I kinda agree with both sides of the argument. Whilst taken one way it _is_ comparing apples to oranges, its deeply beneficial to optimize the obvious/"idiomatic" cases - especially if you can [without introducing friction / special cases]. - ifTrue: and/or ifFalse anyone ? Aaaand the language /runtime/environment should be evaluated on the grounds of how it handles the "idiomatic cases" -- unless you wanna diverge into the territory of "how much assembly [or its hi-level equiv.] is too much optimization". No minus points for python here. But no way they can do sane reloading while keeping current semantics ... It's all a trade-off, and the only clean winners overall are Smalltalk & [Common] Lisp, IMNSHO. - biased, but happy; in denial, also (?) - mostly paid for working with other laguages/runtimes :-/ Cheers, M.R. -----Original Message----- From: Jimmie Houchin <jlhouchin@gmail.com> Sent: Tuesday, January 11, 2022 3:37 PM To: pharo-dev@lists.pharo.org Subject: [Pharo-dev] Re: Array sum. is very slow Personally I am okay with Python implementing in C. That is their implementation detail. It does not impose anything on the user other than knowing normal Python. It isn't cheating or unfair. They are under no obligation to handicap themselves so that we can be more comparable. Are we going to put such requirements on C, C++, Julia, Crystal, Nim? I expect every language to put forth its best. I would like the same for Pharo. And let the numbers fall where they may. Jimmie On 1/11/22 03:07, Andrei Chis wrote: > Hi Jimmie, > > I was scanning through this thread and saw that the Python call uses > the sum function. If I remember correctly, in Python the built-in sum > function is directly implemented in C [1] (unless Python is compiled > with SLOW_SUM set to true). In that case on large arrays the function > can easily be several times faster than just iterating over the > individual objects as the Pharo code does. The benchmark seems to > compare summing numbers in C with summing numbers in Pharo. Would be > interesting to modify the Python code to use a similar loop as in > Pharo for doing the sum. > > Cheers, > Andrei > > [1] > https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675 > cdbd0193b/Python/bltinmodule.c#L2461 > > On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: >> Some experiments and discoveries. >> >> I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. >> >> The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. >> >> The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. >> >> The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. >> >> When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. >> >> So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. >> >> >> >> So I decided to breakdown the #sum and try some things. >> >> Starting with the initial implementation and SequenceableCollection's >> default #sum time of 02:04:03 >> >> >> "This implementation does no work. Only iterates through the array. >> It completed in 00:10:08" >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | ]. >> ^ sum >> >> >> "This implementation does no work, but adds to iteration, accessing the value of the array. >> It completed in 00:32:32. >> Quite a bit of time for simply iterating and accessing." >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | self at: each ]. >> ^ sum >> >> >> "This implementation I had in my initial email as an experiment and also several other did the same in theirs. >> A naive simple implementation. >> It completed in 01:00:53. Half the time of the original." >> sum >> | sum | >> sum := 0. >> 1 to: self size do: [ :each | >> sum := sum + (self at: each) ]. >> ^ sum >> >> >> >> "This implementation I also had in my initial email as an experiment I had done. >> It completed in 00:50:18. >> It reduces the iterations and increases the accesses per iteration. >> It is the fastest implementation so far." >> sum >> | sum | >> sum := 0. >> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. >> >> ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >> sum := sum + (self at: i)]. >> ^ sum >> >> Summary >> >> For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. >> >> I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. >> >> I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. >> >> At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. >> >> That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. >> >> I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. >> >> I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. >> >> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. >> >> That said, I have made the decision to go all in with Pharo. Set aside all else. >> In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. >> >> Thanks for all of your help in exploring the problem. >> >> >> Jimmie Houchin
HS
Henrik Sperre Johansen
Wed, Jan 12, 2022 3:31 PM

We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;)

| fa r |
fa := FloatArray new: 28800.
r := Random new.
1 to fa size do: [ :i | fa at: i put: r next ].
[ 1 to: fa size do: [ :i | fa sum ] ] timeToRun

Pharo 9, no plugin:
0:00:01:14.777
Pharo 5, with plugin:
0:00:00:00.526

Cheers,
Henry

On 11 Jan 2022, at 10:08, Andrei Chis chisvasileandrei@gmail.com wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

((self size quo: 10) * 10 + 1) to: self size do: [ :i |
sum := sum + (self at: i)].
^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;) | fa r | fa := FloatArray new: 28800. r := Random new. 1 to fa size do: [ :i | fa at: i put: r next ]. [ 1 to: fa size do: [ :i | fa sum ] ] timeToRun Pharo 9, no plugin: 0:00:01:14.777 Pharo 5, with plugin: 0:00:00:00.526 Cheers, Henry > On 11 Jan 2022, at 10:08, Andrei Chis <chisvasileandrei@gmail.com> wrote: > > Hi Jimmie, > > I was scanning through this thread and saw that the Python call uses > the sum function. If I remember correctly, in Python the built-in sum > function is directly implemented in C [1] (unless Python is compiled > with SLOW_SUM set to true). In that case on large arrays the function > can easily be several times faster than just iterating over the > individual objects as the Pharo code does. The benchmark seems to > compare summing numbers in C with summing numbers in Pharo. Would be > interesting to modify the Python code to use a similar loop as in > Pharo for doing the sum. > > Cheers, > Andrei > > [1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461 > >> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: >> >> Some experiments and discoveries. >> >> I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. >> >> The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. >> >> The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. >> >> The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. >> >> When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. >> >> So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. >> >> >> >> So I decided to breakdown the #sum and try some things. >> >> Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 >> >> >> "This implementation does no work. Only iterates through the array. >> It completed in 00:10:08" >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | ]. >> ^ sum >> >> >> "This implementation does no work, but adds to iteration, accessing the value of the array. >> It completed in 00:32:32. >> Quite a bit of time for simply iterating and accessing." >> sum >> | sum | >> sum := 1. >> 1 to: self size do: [ :each | self at: each ]. >> ^ sum >> >> >> "This implementation I had in my initial email as an experiment and also several other did the same in theirs. >> A naive simple implementation. >> It completed in 01:00:53. Half the time of the original." >> sum >> | sum | >> sum := 0. >> 1 to: self size do: [ :each | >> sum := sum + (self at: each) ]. >> ^ sum >> >> >> >> "This implementation I also had in my initial email as an experiment I had done. >> It completed in 00:50:18. >> It reduces the iterations and increases the accesses per iteration. >> It is the fastest implementation so far." >> sum >> | sum | >> sum := 0. >> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. >> >> ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >> sum := sum + (self at: i)]. >> ^ sum >> >> Summary >> >> For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. >> >> I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. >> >> I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. >> >> At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. >> >> That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. >> >> I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. >> >> I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. >> >> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. >> >> That said, I have made the decision to go all in with Pharo. Set aside all else. >> In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. >> >> Thanks for all of your help in exploring the problem. >> >> >> Jimmie Houchin
SV
Sven Van Caekenberghe
Wed, Jan 12, 2022 3:50 PM

Yes that would certainly be useful.

But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need a DoubleFloatArray since 64-bit Floats are the default nowadays.

On 12 Jan 2022, at 16:31, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:

We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;)

| fa r |
fa := FloatArray new: 28800.
r := Random new.
1 to fa size do: [ :i | fa at: i put: r next ].
[ 1 to: fa size do: [ :i | fa sum ] ] timeToRun

Pharo 9, no plugin:
0:00:01:14.777
Pharo 5, with plugin:
0:00:00:00.526

Cheers,
Henry

On 11 Jan 2022, at 10:08, Andrei Chis chisvasileandrei@gmail.com wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

((self size quo: 10) * 10 + 1) to: self size do: [ :i |
sum := sum + (self at: i)].
^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

Yes that would certainly be useful. But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need a DoubleFloatArray since 64-bit Floats are the default nowadays. > On 12 Jan 2022, at 16:31, Henrik Sperre Johansen <henrik.s.johansen@veloxit.no> wrote: > > We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;) > > | fa r | > fa := FloatArray new: 28800. > r := Random new. > 1 to fa size do: [ :i | fa at: i put: r next ]. > [ 1 to: fa size do: [ :i | fa sum ] ] timeToRun > > Pharo 9, no plugin: > 0:00:01:14.777 > Pharo 5, with plugin: > 0:00:00:00.526 > > Cheers, > Henry > > >> On 11 Jan 2022, at 10:08, Andrei Chis <chisvasileandrei@gmail.com> wrote: >> >> Hi Jimmie, >> >> I was scanning through this thread and saw that the Python call uses >> the sum function. If I remember correctly, in Python the built-in sum >> function is directly implemented in C [1] (unless Python is compiled >> with SLOW_SUM set to true). In that case on large arrays the function >> can easily be several times faster than just iterating over the >> individual objects as the Pharo code does. The benchmark seems to >> compare summing numbers in C with summing numbers in Pharo. Would be >> interesting to modify the Python code to use a similar loop as in >> Pharo for doing the sum. >> >> Cheers, >> Andrei >> >> [1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461 >> >>> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: >>> >>> Some experiments and discoveries. >>> >>> I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. >>> >>> The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. >>> >>> The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. >>> >>> The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. >>> >>> When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. >>> >>> So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. >>> >>> >>> >>> So I decided to breakdown the #sum and try some things. >>> >>> Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 >>> >>> >>> "This implementation does no work. Only iterates through the array. >>> It completed in 00:10:08" >>> sum >>> | sum | >>> sum := 1. >>> 1 to: self size do: [ :each | ]. >>> ^ sum >>> >>> >>> "This implementation does no work, but adds to iteration, accessing the value of the array. >>> It completed in 00:32:32. >>> Quite a bit of time for simply iterating and accessing." >>> sum >>> | sum | >>> sum := 1. >>> 1 to: self size do: [ :each | self at: each ]. >>> ^ sum >>> >>> >>> "This implementation I had in my initial email as an experiment and also several other did the same in theirs. >>> A naive simple implementation. >>> It completed in 01:00:53. Half the time of the original." >>> sum >>> | sum | >>> sum := 0. >>> 1 to: self size do: [ :each | >>> sum := sum + (self at: each) ]. >>> ^ sum >>> >>> >>> >>> "This implementation I also had in my initial email as an experiment I had done. >>> It completed in 00:50:18. >>> It reduces the iterations and increases the accesses per iteration. >>> It is the fastest implementation so far." >>> sum >>> | sum | >>> sum := 0. >>> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >>> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. >>> >>> ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >>> sum := sum + (self at: i)]. >>> ^ sum >>> >>> Summary >>> >>> For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. >>> >>> I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. >>> >>> I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. >>> >>> At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. >>> >>> That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. >>> >>> I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. >>> >>> I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. >>> >>> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. >>> >>> That said, I have made the decision to go all in with Pharo. Set aside all else. >>> In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. >>> >>> Thanks for all of your help in exploring the problem. >>> >>> >>> Jimmie Houchin
HS
Henrik Sperre Johansen
Wed, Jan 12, 2022 4:38 PM

True!
It’s a little bit of a naming conundrum, since the «Float» in Pharo is already 64-bit, but since we’re speaking «native» arrays,
DoubleArray
would be the best, I guess.

Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray classes have incorrect definitions in Pharo 9 AFAICT- variableByte/WordSubclasses, instead of variableDoubleByte/variableDoubleWordSubclasses…

| dwa |
dwa := DoubleWordArray new: 1.
dwa at: 1 put: 1 << 32.

and

| dba |
dba := DoubleByteArray new: 1.
dba at: 1 put: 256.

should work…

Cheers,
Henry

On 12 Jan 2022, at 16:51, Sven Van Caekenberghe sven@stfx.eu wrote:

Yes that would certainly be useful.

But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need a DoubleFloatArray since 64-bit Floats are the default nowadays.

On 12 Jan 2022, at 16:31, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:

We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;)

| fa r |
fa := FloatArray new: 28800.
r := Random new.
1 to fa size do: [ :i | fa at: i put: r next ].
[ 1 to: fa size do: [ :i | fa sum ] ] timeToRun

Pharo 9, no plugin:
0:00:01:14.777
Pharo 5, with plugin:
0:00:00:00.526

Cheers,
Henry

On 11 Jan 2022, at 10:08, Andrei Chis chisvasileandrei@gmail.com wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin jlhouchin@gmail.com wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays.

The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations.

So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
| sum |
sum := 1.
1 to: self size do: [ :each | ].
^ sum

"This implementation does no work, but adds to iteration, accessing the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
| sum |
sum := 1.
1 to: self size do: [ :each | self at: each ].
^ sum

"This implementation I had in my initial email as an experiment and also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum

"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
| sum |
sum := 0.
1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))].

((self size quo: 10) * 10 + 1) to: self size do: [ :i |
sum := sum + (self at: i)].
^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here.

I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside all else.
In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week.

Thanks for all of your help in exploring the problem.

Jimmie Houchin

True! It’s a little bit of a naming conundrum, since the «Float» in Pharo is already 64-bit, but since we’re speaking «native» arrays, DoubleArray would be the best, I guess. Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray classes have incorrect definitions in Pharo 9 AFAICT- variableByte/WordSubclasses, instead of variableDoubleByte/variableDoubleWordSubclasses… | dwa | dwa := DoubleWordArray new: 1. dwa at: 1 put: 1 << 32. and | dba | dba := DoubleByteArray new: 1. dba at: 1 put: 256. *should* work… Cheers, Henry > On 12 Jan 2022, at 16:51, Sven Van Caekenberghe <sven@stfx.eu> wrote: > > Yes that would certainly be useful. > > But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need a DoubleFloatArray since 64-bit Floats are the default nowadays. > >> On 12 Jan 2022, at 16:31, Henrik Sperre Johansen <henrik.s.johansen@veloxit.no> wrote: >> >> We could also try modifying Pharo to use C by reintroducing the FloatArray plugin ;) >> >> | fa r | >> fa := FloatArray new: 28800. >> r := Random new. >> 1 to fa size do: [ :i | fa at: i put: r next ]. >> [ 1 to: fa size do: [ :i | fa sum ] ] timeToRun >> >> Pharo 9, no plugin: >> 0:00:01:14.777 >> Pharo 5, with plugin: >> 0:00:00:00.526 >> >> Cheers, >> Henry >> >> >>>> On 11 Jan 2022, at 10:08, Andrei Chis <chisvasileandrei@gmail.com> wrote: >>> >>> Hi Jimmie, >>> >>> I was scanning through this thread and saw that the Python call uses >>> the sum function. If I remember correctly, in Python the built-in sum >>> function is directly implemented in C [1] (unless Python is compiled >>> with SLOW_SUM set to true). In that case on large arrays the function >>> can easily be several times faster than just iterating over the >>> individual objects as the Pharo code does. The benchmark seems to >>> compare summing numbers in C with summing numbers in Pharo. Would be >>> interesting to modify the Python code to use a similar loop as in >>> Pharo for doing the sum. >>> >>> Cheers, >>> Andrei >>> >>> [1] https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461 >>> >>>> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin <jlhouchin@gmail.com> wrote: >>>> >>>> Some experiments and discoveries. >>>> >>>> I am running my full language test every time. It is the only way I can compare results. It is also what fully stresses the language. >>>> >>>> The reason I wrote the test as I did is because I wanted to know a couple of things. Is the language sufficiently performant on basic maths. I am not doing any high PolyMath level math. Simple things like moving averages over portions of arrays. >>>> >>>> The other is efficiency of array iteration and access. This why #sum is the best test of this attribute. #sum iterates and accesses every element of the array. It will reveal if there are any problems. >>>> >>>> The default test Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes. >>>> >>>> When I comment out the #sum and #average calls, Pharo completes the test in 3.5 seconds. So almost all the time is spent in those two calls. >>>> >>>> So most of this conversation has focused on why #sum is as slow as it is or how to improve the performance of #sum with other implementations. >>>> >>>> >>>> >>>> So I decided to breakdown the #sum and try some things. >>>> >>>> Starting with the initial implementation and SequenceableCollection's default #sum time of 02:04:03 >>>> >>>> >>>> "This implementation does no work. Only iterates through the array. >>>> It completed in 00:10:08" >>>> sum >>>> | sum | >>>> sum := 1. >>>> 1 to: self size do: [ :each | ]. >>>> ^ sum >>>> >>>> >>>> "This implementation does no work, but adds to iteration, accessing the value of the array. >>>> It completed in 00:32:32. >>>> Quite a bit of time for simply iterating and accessing." >>>> sum >>>> | sum | >>>> sum := 1. >>>> 1 to: self size do: [ :each | self at: each ]. >>>> ^ sum >>>> >>>> >>>> "This implementation I had in my initial email as an experiment and also several other did the same in theirs. >>>> A naive simple implementation. >>>> It completed in 01:00:53. Half the time of the original." >>>> sum >>>> | sum | >>>> sum := 0. >>>> 1 to: self size do: [ :each | >>>> sum := sum + (self at: each) ]. >>>> ^ sum >>>> >>>> >>>> >>>> "This implementation I also had in my initial email as an experiment I had done. >>>> It completed in 00:50:18. >>>> It reduces the iterations and increases the accesses per iteration. >>>> It is the fastest implementation so far." >>>> sum >>>> | sum | >>>> sum := 0. >>>> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i | >>>> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + (self at: (i + 3)) + (self at: (i + 4)) + (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 9))]. >>>> >>>> ((self size quo: 10) * 10 + 1) to: self size do: [ :i | >>>> sum := sum + (self at: i)]. >>>> ^ sum >>>> >>>> Summary >>>> >>>> For whatever reason iterating and accessing on an Array is expensive. That alone took longer than Python to complete the entire test. >>>> >>>> I had allowed this knowledge of how much slower Pharo was to stop me from using Pharo. Encouraged me to explore other options. >>>> >>>> I have the option to use any language I want. I like Pharo. I do not like Python at all. Julia is unexciting to me. I don't like their anti-OO approach. >>>> >>>> At one point I had a fairly complete Pharo implementation, which is where I got frustrated with backtesting taking days. >>>> >>>> That implementation is gone. I had not switched to Iceberg. I had a problem with my hard drive. So I am starting over. >>>> >>>> I am not a computer scientist, language expert, vm expert or anyone with the skills to discover and optimize arrays. So I will end my tilting at windmills here. >>>> >>>> I value all the other things that Pharo brings, that I miss when I am using Julia or Python or Crystal, etc. Those languages do not have the vision to do what Pharo (or any Smalltalk) does. >>>> >>>> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me. >>>> >>>> That said, I have made the decision to go all in with Pharo. Set aside all else. >>>> In that regard I went ahead and put my money in with my decision and joined the Pharo Association last week. >>>> >>>> Thanks for all of your help in exploring the problem. >>>> >>>> >>>> Jimmie Houchin
MD
Marcus Denker
Wed, Jan 12, 2022 5:06 PM

On 12 Jan 2022, at 17:38, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:

True!
It’s a little bit of a naming conundrum, since the «Float» in Pharo is already 64-bit, but since we’re speaking «native» arrays,
DoubleArray
would be the best, I guess.

Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray classes have incorrect definitions in Pharo 9 AFAICT- variableByte/WordSubclasses, instead of variableDoubleByte/variableDoubleWordSubclasses…

> On 12 Jan 2022, at 17:38, Henrik Sperre Johansen <henrik.s.johansen@veloxit.no> wrote: > > True! > It’s a little bit of a naming conundrum, since the «Float» in Pharo is already 64-bit, but since we’re speaking «native» arrays, > DoubleArray > would be the best, I guess. > > Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray classes have incorrect definitions in Pharo 9 AFAICT- variableByte/WordSubclasses, instead of variableDoubleByte/variableDoubleWordSubclasses… We finally fixed this in Pharo10 2 days ago: https://github.com/pharo-project/pharo/pull/9792 <https://github.com/pharo-project/pharo/pull/9792> Marcus
T
tesonep@gmail.com
Wed, Jan 12, 2022 5:24 PM

I have activated the plugin in the build for Pharo9, it will be available
in the next release.

It will be interesting to have a Float64 extension to it.

On Wed, Jan 12, 2022, 18:06 Marcus Denker marcus.denker@inria.fr wrote:

On 12 Jan 2022, at 17:38, Henrik Sperre Johansen <
henrik.s.johansen@veloxit.no> wrote:

True!
It’s a little bit of a naming conundrum, since the «Float» in Pharo is
already 64-bit, but since we’re speaking «native» arrays,
DoubleArray
would be the best, I guess.

Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray
classes have incorrect definitions in Pharo 9 AFAICT-
variableByte/WordSubclasses, instead of
variableDoubleByte/variableDoubleWordSubclasses…

We finally fixed this in Pharo10 2 days ago:
https://github.com/pharo-project/pharo/pull/9792

Marcus

I have activated the plugin in the build for Pharo9, it will be available in the next release. It will be interesting to have a Float64 extension to it. On Wed, Jan 12, 2022, 18:06 Marcus Denker <marcus.denker@inria.fr> wrote: > > > On 12 Jan 2022, at 17:38, Henrik Sperre Johansen < > henrik.s.johansen@veloxit.no> wrote: > > True! > It’s a little bit of a naming conundrum, since the «Float» in Pharo is > already 64-bit, but since we’re speaking «native» arrays, > DoubleArray > would be the best, I guess. > > Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray > classes have incorrect definitions in Pharo 9 AFAICT- > variableByte/WordSubclasses, instead of > variableDoubleByte/variableDoubleWordSubclasses… > > > > We finally fixed this in Pharo10 2 days ago: > https://github.com/pharo-project/pharo/pull/9792 > > Marcus >