pharo-users@lists.pharo.org

Any question about pharo is welcome

View all threads

Splitting a single HTTP Request into multiple concurrent requests

SV
Sven Van Caekenberghe
Mon, Oct 18, 2021 3:04 PM

Hi,

Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it.

Upfront I should state that

  • I do no think this is worth the trouble
  • It is only applicable to large downloads (even larger than in the example)
  • The other side (server) must honour Range requests correctly (and be fast)

This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one.

sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].

size := sizes last.
concurrency := 11.
step := size // concurrency.

ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].

chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.

[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.

This takes about 2 seconds total for me.

[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.

Which is roughly similar to the single request (again, for me).

Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening.

The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel.

HTH,

Sven

Hi, Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it. Upfront I should state that - I do no think this is worth the trouble - It is only applicable to large downloads (even larger than in the example) - The other side (server) must honour Range requests correctly (and be fast) This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one. sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ]. size := sizes last. concurrency := 11. step := size // concurrency. ranges := (0 to: size - 1 by: step) collect: [ :each | { each. (each + step) min: size } ]. chunks := Array new: ranges size. done := Semaphore new. ms := 0. [ ms := Time millisecondClockValue. ranges withIndexDo: [ :range :index | [ | client | (client := ZnClient new) https; host: 's3-eu-west-1.amazonaws.com'; addPath: 'public-stfx-eu'. client addPath: ('test-{1}.txt' format: { size }). client headerAt: #Range put: ('bytes={1}-{2}' format: range). client get. client close. chunks at: index put: client contents. done signal ] forkAt: Processor lowIOPriority ]. ranges size timesRepeat: [ done wait ]. ms := Time millisecondsSince: ms. (String empty join: chunks) inspect. ] fork. This takes about 2 seconds total for me. [ ZnClient new https; host: 's3-eu-west-1.amazonaws.com'; addPath: 'public-stfx-eu'; addPath: 'test-99425.txt'; get. ] timeToRun. Which is roughly similar to the single request (again, for me). Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening. The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel. HTH, Sven
YC
Yanni Chiu
Mon, Oct 18, 2021 10:17 PM

A good use case is when one of the downloads fails. When it’s just one big
one then you have start over from the beginning.

On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe sven@stfx.eu wrote:

Hi,

Somebody asked how you would split single HTTP Request into multiple
concurrent requests. This is one way to do it.

Upfront I should state that

  • I do no think this is worth the trouble
  • It is only applicable to large downloads (even larger than in the
    example)
  • The other side (server) must honour Range requests correctly (and be
    fast)

This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers
units test. More specifically the files available under
https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as
https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the
smallest one.

sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].

size := sizes last.
concurrency := 11.
step := size // concurrency.

ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].

chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.

[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.

This takes about 2 seconds total for me.

[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.

Which is roughly similar to the single request (again, for me).

Two things to note: connection time dominates, in the parallel case, 11
independent requests were executed, so concurrency is definitively
happening.

The largest size file is just 100k, split in about 10 parts, which is most
probably not enough to see much effect from doing things in parallel.

HTH,

Sven

A good use case is when one of the downloads fails. When it’s just one big one then you have start over from the beginning. On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe <sven@stfx.eu> wrote: > Hi, > > Somebody asked how you would split single HTTP Request into multiple > concurrent requests. This is one way to do it. > > Upfront I should state that > > - I do no think this is worth the trouble > - It is only applicable to large downloads (even larger than in the > example) > - The other side (server) must honour Range requests correctly (and be > fast) > > This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers > units test. More specifically the files available under > https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as > https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the > smallest one. > > sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ]. > > size := sizes last. > concurrency := 11. > step := size // concurrency. > > ranges := (0 to: size - 1 by: step) collect: [ :each | > { each. (each + step) min: size } ]. > > chunks := Array new: ranges size. > done := Semaphore new. > ms := 0. > > [ > ms := Time millisecondClockValue. > ranges withIndexDo: [ :range :index | > [ | client | > (client := ZnClient new) > https; > host: 's3-eu-west-1.amazonaws.com'; > addPath: 'public-stfx-eu'. > client addPath: ('test-{1}.txt' format: { size }). > client headerAt: #Range put: ('bytes={1}-{2}' format: range). > client get. > client close. > chunks at: index put: client contents. > done signal ] forkAt: Processor lowIOPriority ]. > ranges size timesRepeat: [ done wait ]. > ms := Time millisecondsSince: ms. > (String empty join: chunks) inspect. > ] fork. > > This takes about 2 seconds total for me. > > [ > ZnClient new > https; > host: 's3-eu-west-1.amazonaws.com'; > addPath: 'public-stfx-eu'; > addPath: 'test-99425.txt'; > get. > ] timeToRun. > > Which is roughly similar to the single request (again, for me). > > Two things to note: connection time dominates, in the parallel case, 11 > independent requests were executed, so concurrency is definitively > happening. > > The largest size file is just 100k, split in about 10 parts, which is most > probably not enough to see much effect from doing things in parallel. > > HTH, > > Sven >
SV
Sven Van Caekenberghe
Tue, Oct 19, 2021 1:15 PM

On 19 Oct 2021, at 00:17, Yanni Chiu yannix7db@gmail.com wrote:

A good use case is when one of the downloads fails. When it’s just one big one then you have start over from the beginning.

Yes, that is a good use case for the Range feature.

It is possible to configure ZnClient to retry when a request fails, for example:

client numberOfRetries: 3; retryDelay: 2 "seconds".

could be added to the example.

On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe sven@stfx.eu wrote:
Hi,

Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it.

Upfront I should state that

  • I do no think this is worth the trouble
  • It is only applicable to large downloads (even larger than in the example)
  • The other side (server) must honour Range requests correctly (and be fast)

This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one.

sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].

size := sizes last.
concurrency := 11.
step := size // concurrency.

ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].

chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.

[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.

This takes about 2 seconds total for me.

[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.

Which is roughly similar to the single request (again, for me).

Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening.

The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel.

HTH,

Sven

> On 19 Oct 2021, at 00:17, Yanni Chiu <yannix7db@gmail.com> wrote: > > A good use case is when one of the downloads fails. When it’s just one big one then you have start over from the beginning. Yes, that is a good use case for the Range feature. It is possible to configure ZnClient to retry when a request fails, for example: client numberOfRetries: 3; retryDelay: 2 "seconds". could be added to the example. > On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe <sven@stfx.eu> wrote: > Hi, > > Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it. > > Upfront I should state that > > - I do no think this is worth the trouble > - It is only applicable to large downloads (even larger than in the example) > - The other side (server) must honour Range requests correctly (and be fast) > > This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one. > > sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ]. > > size := sizes last. > concurrency := 11. > step := size // concurrency. > > ranges := (0 to: size - 1 by: step) collect: [ :each | > { each. (each + step) min: size } ]. > > chunks := Array new: ranges size. > done := Semaphore new. > ms := 0. > > [ > ms := Time millisecondClockValue. > ranges withIndexDo: [ :range :index | > [ | client | > (client := ZnClient new) > https; > host: 's3-eu-west-1.amazonaws.com'; > addPath: 'public-stfx-eu'. > client addPath: ('test-{1}.txt' format: { size }). > client headerAt: #Range put: ('bytes={1}-{2}' format: range). > client get. > client close. > chunks at: index put: client contents. > done signal ] forkAt: Processor lowIOPriority ]. > ranges size timesRepeat: [ done wait ]. > ms := Time millisecondsSince: ms. > (String empty join: chunks) inspect. > ] fork. > > This takes about 2 seconds total for me. > > [ > ZnClient new > https; > host: 's3-eu-west-1.amazonaws.com'; > addPath: 'public-stfx-eu'; > addPath: 'test-99425.txt'; > get. > ] timeToRun. > > Which is roughly similar to the single request (again, for me). > > Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening. > > The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel. > > HTH, > > Sven