P3Converter and DateAndTime>>#readFrom: performance

EM
Esteban Maringolo
Mon, Oct 12, 2020 1:04 PM

Hi,

I recently stumbled into some performance issue I had, so I profiled it and got it fixed, but by doing that I also noticed a huge time spent when reading DateAndTimes and Dates (which also uses the former) and the culprit seems to be Date class>>#readFrom:.

Digging into this, I noticed that this was called from DateAndTime>>#readFrom:, but in particular 90% of the time spent was when instantiating the Date part of it, because it does unnecessary lookup of the index of Month when input stream only have a month index.

What I did then was to change the default behavior of DateAndTime class>>#readFrom:

readFrom: aStream defaultOffset: defaultOffset "Parse and return a new DateAndTime instance from stream, as a Date, an optional Time and an optional TimeZone offset. The time defaults to midnight, the timezone to defaultOffset" "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"

"From this" date := Date readFrom: aStream.

"To this" date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'.

And also in P3Converter to specify the pattern directly.

P3Converter>>#convertDateFrom: bytes length: length description: description ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: 'yyyy-mm-dd'

With these two changes I got a 8x-10x speedup when reading Date/DateAndTime from the database.

Then I dig a little further and in P3Converter added a little check to avoid translating the offset of a DateAndTime when it is not needed.

P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: length description: description "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in Postgres the way it was inserted, its representation remains constant, with no offset added. We use the timezone of the connection to do the necessary shifting. This assumes that the timezones used during insert/update and query are the same."

| timestamp offset | timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes length: length) defaultOffset: Duration zero. offset := self timezone offsetForTimestamp: timestamp. ^offset isZero "<-- added this check to avoid translation" ifTrue: [ timestamp ] ifFalse: [ timestamp translateTo: offset ]

And the speedup went even higher. But I don't know whether this last part is correct.

Of course this seems as a quick and dirty fix, but I don't know whether DateAndTime>>readFrom: should read any format or just ISO 8601 formats nor whether the Date/DateAndTime coming from PostgreSQL are always formatted that way. I ran my app's small test suite and haven't had any issues, neither during development with these changes applied.

I think there is a significant slowdown involved in the instantiation of Dates that should be fixed, whether this fix is definitive or not is why I'm asking here.

Regards!

Esteban A. Maringolo

Hi, I recently stumbled into some performance issue I had, so I profiled it and got it fixed, but by doing that I also noticed a huge time spent when reading DateAndTimes and Dates (which also uses the former) and the culprit seems to be `Date class>>#readFrom:`. Digging into this, I noticed that this was called from `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was when instantiating the Date part of it, because it does unnecessary lookup of the index of Month when input stream only have a month index. What I did then was to change the default behavior of DateAndTime class>>#readFrom: readFrom: aStream defaultOffset: defaultOffset "Parse and return a new DateAndTime instance from stream, as a Date, an optional Time and an optional TimeZone offset. The time defaults to midnight, the timezone to defaultOffset" "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream" "From this" date := Date readFrom: aStream. "To this" date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'. And also in P3Converter to specify the pattern directly. P3Converter>>#convertDateFrom: bytes length: length description: description ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: 'yyyy-mm-dd' With these two changes I got a 8x-10x speedup when reading Date/DateAndTime from the database. Then I dig a little further and in P3Converter added a little check to avoid translating the offset of a DateAndTime when it is not needed. P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: length description: description "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in Postgres the way it was inserted, its representation remains constant, with no offset added. We use the timezone of the connection to do the necessary shifting. This assumes that the timezones used during insert/update and query are the same." | timestamp offset | timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes length: length) defaultOffset: Duration zero. offset := self timezone offsetForTimestamp: timestamp. ^offset isZero "<-- added this check to avoid translation" ifTrue: [ timestamp ] ifFalse: [ timestamp translateTo: offset ] And the speedup went even higher. But I don't know whether this last part is correct. Of course this seems as a quick and dirty fix, but I don't know whether DateAndTime>>readFrom: should read any format or just ISO 8601 formats nor whether the Date/DateAndTime coming from PostgreSQL are always formatted that way. I ran my app's small test suite and haven't had any issues, neither during development with these changes applied. I think there is a significant slowdown involved in the instantiation of Dates that should be fixed, whether this fix is definitive or not is why I'm asking here. Regards! Esteban A. Maringolo
SV
Sven Van Caekenberghe
Mon, Oct 12, 2020 1:57 PM

Hi Esteban,

Thanks for this detailed feedback, this is most interesting.

I don't have enough time to look at this in detail, I will do so later, but here is some initial feedback.

Note that PostreSQL supports many different date/time formats and that the user can set preferences per connection, hence some flexibility in parsing is probably needed, and that is one reason why the parsing is implemented the way it is.

Check section 8.5.2. from https://www.postgresql.org/docs/13/datatype-datetime.html

Note that all of the following give the same result:

Date readFrom: 'March 2 2020' readStream. Date readFrom: '3/2/2020' readStream. Date readFrom: '2020-03-02' readStream.

Date parsing is probably slow because it has to deal with all these formats.

Of course, parsing a fixed well-known format is always faster.

The timezones are tricky too, as preferences can be set per connection, so we have to be careful.

I will come back to this.

Sven

On 12 Oct 2020, at 15:04, Esteban Maringolo emaringolo@gmail.com wrote:

Hi,

I recently stumbled into some performance issue I had, so I profiled it and got it fixed, but by doing that I also noticed a huge time spent when reading DateAndTimes and Dates (which also uses the former) and the culprit seems to be Date class>>#readFrom:.

Digging into this, I noticed that this was called from DateAndTime>>#readFrom:, but in particular 90% of the time spent was when instantiating the Date part of it, because it does unnecessary lookup of the index of Month when input stream only have a month index.

What I did then was to change the default behavior of DateAndTime class>>#readFrom:

readFrom: aStream defaultOffset: defaultOffset "Parse and return a new DateAndTime instance from stream, as a Date, an optional Time and an optional TimeZone offset. The time defaults to midnight, the timezone to defaultOffset" "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"

"From this" date := Date readFrom: aStream.

"To this" date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'.

And also in P3Converter to specify the pattern directly.

P3Converter>>#convertDateFrom: bytes length: length description: description ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: 'yyyy-mm-dd'

With these two changes I got a 8x-10x speedup when reading Date/DateAndTime from the database.

Then I dig a little further and in P3Converter added a little check to avoid translating the offset of a DateAndTime when it is not needed.

P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: length description: description "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in Postgres the way it was inserted, its representation remains constant, with no offset added. We use the timezone of the connection to do the necessary shifting. This assumes that the timezones used during insert/update and query are the same."

| timestamp offset | timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes length: length) defaultOffset: Duration zero. offset := self timezone offsetForTimestamp: timestamp. ^offset isZero "<-- added this check to avoid translation" ifTrue: [ timestamp ] ifFalse: [ timestamp translateTo: offset ]

And the speedup went even higher. But I don't know whether this last part is correct.

Of course this seems as a quick and dirty fix, but I don't know whether DateAndTime>>readFrom: should read any format or just ISO 8601 formats nor whether the Date/DateAndTime coming from PostgreSQL are always formatted that way. I ran my app's small test suite and haven't had any issues, neither during development with these changes applied.

I think there is a significant slowdown involved in the instantiation of Dates that should be fixed, whether this fix is definitive or not is why I'm asking here.

Regards!

Esteban A. Maringolo

Hi Esteban, Thanks for this detailed feedback, this is most interesting. I don't have enough time to look at this in detail, I will do so later, but here is some initial feedback. Note that PostreSQL supports many different date/time formats and that the user can set preferences per connection, hence some flexibility in parsing is probably needed, and that is one reason why the parsing is implemented the way it is. Check section 8.5.2. from https://www.postgresql.org/docs/13/datatype-datetime.html Note that all of the following give the same result: Date readFrom: 'March 2 2020' readStream. Date readFrom: '3/2/2020' readStream. Date readFrom: '2020-03-02' readStream. Date parsing is probably slow because it has to deal with all these formats. Of course, parsing a fixed well-known format is always faster. The timezones are tricky too, as preferences can be set per connection, so we have to be careful. I will come back to this. Sven > On 12 Oct 2020, at 15:04, Esteban Maringolo <emaringolo@gmail.com> wrote: > > Hi, > > I recently stumbled into some performance issue I had, so I profiled > it and got it fixed, but by doing that I also noticed a huge time > spent when reading DateAndTimes and Dates (which also uses the former) > and the culprit seems to be `Date class>>#readFrom:`. > > Digging into this, I noticed that this was called from > `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was > when instantiating the Date part of it, because it does unnecessary > lookup of the index of Month when input stream only have a month > index. > > What I did then was to change the default behavior of DateAndTime > class>>#readFrom: > > readFrom: aStream defaultOffset: defaultOffset > "Parse and return a new DateAndTime instance from stream, > as a Date, an optional Time and an optional TimeZone offset. > The time defaults to midnight, the timezone to defaultOffset" > "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream" > > "From this" > date := Date readFrom: aStream. > > "To this" > date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'. > > And also in P3Converter to specify the pattern directly. > > P3Converter>>#convertDateFrom: bytes length: length description: description > ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: > 'yyyy-mm-dd' > > With these two changes I got a 8x-10x speedup when reading > Date/DateAndTime from the database. > > Then I dig a little further and in P3Converter added a little check to > avoid translating the offset of a DateAndTime when it is not needed. > > P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: > length description: description > "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in > Postgres the way it was inserted, its representation remains constant, > with no offset added. We use the timezone of the connection to do the > necessary shifting. This assumes that the timezones used during > insert/update and query are the same." > > | timestamp offset | > timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes > length: length) defaultOffset: Duration zero. > offset := self timezone offsetForTimestamp: timestamp. > ^offset isZero "<-- added this check to avoid translation" > ifTrue: [ timestamp ] > ifFalse: [ timestamp translateTo: offset ] > > And the speedup went even higher. But I don't know whether this last > part is correct. > > Of course this seems as a quick and dirty fix, but I don't know > whether DateAndTime>>readFrom: should read any format or just ISO 8601 > formats nor whether the Date/DateAndTime coming from PostgreSQL are > always formatted that way. I ran my app's small test suite and haven't > had any issues, neither during development with these changes applied. > > I think there is a significant slowdown involved in the instantiation > of Dates that should be fixed, whether this fix is definitive or not > is why I'm asking here. > > Regards! > > Esteban A. Maringolo
SV
Sven Van Caekenberghe
Tue, Oct 13, 2020 1:43 PM

Hi Estaban,

I took some time to properly look at your remarks/suggestions.

You are right, the chronology conversions could be improved and made faster.

I committed the following https://github.com/svenvc/P3/commit/dd06eb33d0147060e0b6f44c704e2e298f55c0d4

"As noted/analysed by Esteban A. Maringolo, incoming conversions for Timestamp/DateAndTime and Date especially were slow. He suggested to take advantage of the known ISO format.

Introduction of P3ISO8601Parser that implements a fast, non-checking parser specifically for the PSQL 8601 format.

Use P3ISO8601Parser for the chronology conversions in P3Converter, for a 5x to 10x speedup.

Note that for this to work, the connection output format must be ISO, the default."

I did the following small test to confirm the improvements:

p3client := P3Client url: 'psql://sven@localhost:5432/'.

p3client execute: 'DROP TABLE IF EXISTS table1'.

p3client execute: 'CREATE TABLE table1 (id SERIAL PRIMARY KEY, name TEXT, t1 TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP, t2 TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP, t0 TIME DEFAULT CURRENT_TIME, d1 DATE DEFAULT CURRENT_DATE)'.

1 to: 100 do: [ :each | p3client execute: 'INSERT INTO table1 (name) VALUES (''n-' , (each + 999 atRandom) asString , ''')' ].

[ p3client query: 'SELECT * FROM table1' ] bench.

p3client close.

This query (6 columns, 100 rows) went from 68.332 per second to 446.711 per second on my machine.

Thanks again for your feedback, it was most useful.

Do these changes work for you ?

Regards,

Sven

On 12 Oct 2020, at 15:57, Sven Van Caekenberghe sven@stfx.eu wrote:

Hi Esteban,

Thanks for this detailed feedback, this is most interesting.

I don't have enough time to look at this in detail, I will do so later, but here is some initial feedback.

Note that PostreSQL supports many different date/time formats and that the user can set preferences per connection, hence some flexibility in parsing is probably needed, and that is one reason why the parsing is implemented the way it is.

Check section 8.5.2. from https://www.postgresql.org/docs/13/datatype-datetime.html

Note that all of the following give the same result:

Date readFrom: 'March 2 2020' readStream. Date readFrom: '3/2/2020' readStream. Date readFrom: '2020-03-02' readStream.

Date parsing is probably slow because it has to deal with all these formats.

Of course, parsing a fixed well-known format is always faster.

The timezones are tricky too, as preferences can be set per connection, so we have to be careful.

I will come back to this.

Sven

On 12 Oct 2020, at 15:04, Esteban Maringolo emaringolo@gmail.com wrote:

Hi,

I recently stumbled into some performance issue I had, so I profiled it and got it fixed, but by doing that I also noticed a huge time spent when reading DateAndTimes and Dates (which also uses the former) and the culprit seems to be Date class>>#readFrom:.

Digging into this, I noticed that this was called from DateAndTime>>#readFrom:, but in particular 90% of the time spent was when instantiating the Date part of it, because it does unnecessary lookup of the index of Month when input stream only have a month index.

What I did then was to change the default behavior of DateAndTime class>>#readFrom:

readFrom: aStream defaultOffset: defaultOffset "Parse and return a new DateAndTime instance from stream, as a Date, an optional Time and an optional TimeZone offset. The time defaults to midnight, the timezone to defaultOffset" "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"

"From this" date := Date readFrom: aStream.

"To this" date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'.

And also in P3Converter to specify the pattern directly.

P3Converter>>#convertDateFrom: bytes length: length description: description ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: 'yyyy-mm-dd'

With these two changes I got a 8x-10x speedup when reading Date/DateAndTime from the database.

Then I dig a little further and in P3Converter added a little check to avoid translating the offset of a DateAndTime when it is not needed.

P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: length description: description "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in Postgres the way it was inserted, its representation remains constant, with no offset added. We use the timezone of the connection to do the necessary shifting. This assumes that the timezones used during insert/update and query are the same."

| timestamp offset | timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes length: length) defaultOffset: Duration zero. offset := self timezone offsetForTimestamp: timestamp. ^offset isZero "<-- added this check to avoid translation" ifTrue: [ timestamp ] ifFalse: [ timestamp translateTo: offset ]

And the speedup went even higher. But I don't know whether this last part is correct.

Of course this seems as a quick and dirty fix, but I don't know whether DateAndTime>>readFrom: should read any format or just ISO 8601 formats nor whether the Date/DateAndTime coming from PostgreSQL are always formatted that way. I ran my app's small test suite and haven't had any issues, neither during development with these changes applied.

I think there is a significant slowdown involved in the instantiation of Dates that should be fixed, whether this fix is definitive or not is why I'm asking here.

Regards!

Esteban A. Maringolo

Hi Estaban, I took some time to properly look at your remarks/suggestions. You are right, the chronology conversions could be improved and made faster. I committed the following https://github.com/svenvc/P3/commit/dd06eb33d0147060e0b6f44c704e2e298f55c0d4 "As noted/analysed by Esteban A. Maringolo, incoming conversions for Timestamp/DateAndTime and Date especially were slow. He suggested to take advantage of the known ISO format. Introduction of P3ISO8601Parser that implements a fast, non-checking parser specifically for the PSQL 8601 format. Use P3ISO8601Parser for the chronology conversions in P3Converter, for a 5x to 10x speedup. Note that for this to work, the connection output format must be ISO, the default." I did the following small test to confirm the improvements: p3client := P3Client url: 'psql://sven@localhost:5432/'. p3client execute: 'DROP TABLE IF EXISTS table1'. p3client execute: 'CREATE TABLE table1 (id SERIAL PRIMARY KEY, name TEXT, t1 TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP, t2 TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP, t0 TIME DEFAULT CURRENT_TIME, d1 DATE DEFAULT CURRENT_DATE)'. 1 to: 100 do: [ :each | p3client execute: 'INSERT INTO table1 (name) VALUES (''n-' , (each + 999 atRandom) asString , ''')' ]. [ p3client query: 'SELECT * FROM table1' ] bench. p3client close. This query (6 columns, 100 rows) went from 68.332 per second to 446.711 per second on my machine. Thanks again for your feedback, it was most useful. Do these changes work for you ? Regards, Sven > On 12 Oct 2020, at 15:57, Sven Van Caekenberghe <sven@stfx.eu> wrote: > > Hi Esteban, > > Thanks for this detailed feedback, this is most interesting. > > I don't have enough time to look at this in detail, I will do so later, but here is some initial feedback. > > Note that PostreSQL supports many different date/time formats and that the user can set preferences per connection, hence some flexibility in parsing is probably needed, and that is one reason why the parsing is implemented the way it is. > > Check section 8.5.2. from https://www.postgresql.org/docs/13/datatype-datetime.html > > Note that all of the following give the same result: > > Date readFrom: 'March 2 2020' readStream. > Date readFrom: '3/2/2020' readStream. > Date readFrom: '2020-03-02' readStream. > > Date parsing is probably slow because it has to deal with all these formats. > > Of course, parsing a fixed well-known format is always faster. > > The timezones are tricky too, as preferences can be set per connection, so we have to be careful. > > I will come back to this. > > Sven > >> On 12 Oct 2020, at 15:04, Esteban Maringolo <emaringolo@gmail.com> wrote: >> >> Hi, >> >> I recently stumbled into some performance issue I had, so I profiled >> it and got it fixed, but by doing that I also noticed a huge time >> spent when reading DateAndTimes and Dates (which also uses the former) >> and the culprit seems to be `Date class>>#readFrom:`. >> >> Digging into this, I noticed that this was called from >> `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was >> when instantiating the Date part of it, because it does unnecessary >> lookup of the index of Month when input stream only have a month >> index. >> >> What I did then was to change the default behavior of DateAndTime >> class>>#readFrom: >> >> readFrom: aStream defaultOffset: defaultOffset >> "Parse and return a new DateAndTime instance from stream, >> as a Date, an optional Time and an optional TimeZone offset. >> The time defaults to midnight, the timezone to defaultOffset" >> "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream" >> >> "From this" >> date := Date readFrom: aStream. >> >> "To this" >> date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'. >> >> And also in P3Converter to specify the pattern directly. >> >> P3Converter>>#convertDateFrom: bytes length: length description: description >> ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern: >> 'yyyy-mm-dd' >> >> With these two changes I got a 8x-10x speedup when reading >> Date/DateAndTime from the database. >> >> Then I dig a little further and in P3Converter added a little check to >> avoid translating the offset of a DateAndTime when it is not needed. >> >> P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length: >> length description: description >> "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in >> Postgres the way it was inserted, its representation remains constant, >> with no offset added. We use the timezone of the connection to do the >> necessary shifting. This assumes that the timezones used during >> insert/update and query are the same." >> >> | timestamp offset | >> timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes >> length: length) defaultOffset: Duration zero. >> offset := self timezone offsetForTimestamp: timestamp. >> ^offset isZero "<-- added this check to avoid translation" >> ifTrue: [ timestamp ] >> ifFalse: [ timestamp translateTo: offset ] >> >> And the speedup went even higher. But I don't know whether this last >> part is correct. >> >> Of course this seems as a quick and dirty fix, but I don't know >> whether DateAndTime>>readFrom: should read any format or just ISO 8601 >> formats nor whether the Date/DateAndTime coming from PostgreSQL are >> always formatted that way. I ran my app's small test suite and haven't >> had any issues, neither during development with these changes applied. >> >> I think there is a significant slowdown involved in the instantiation >> of Dates that should be fixed, whether this fix is definitive or not >> is why I'm asking here. >> >> Regards! >> >> Esteban A. Maringolo >