Hi,
Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it.
Upfront I should state that
This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one.
sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].
size := sizes last.
concurrency := 11.
step := size // concurrency.
ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].
chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.
[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.
This takes about 2 seconds total for me.
[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.
Which is roughly similar to the single request (again, for me).
Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening.
The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel.
HTH,
Sven
A good use case is when one of the downloads fails. When it’s just one big
one then you have start over from the beginning.
On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe sven@stfx.eu wrote:
Hi,
Somebody asked how you would split single HTTP Request into multiple
concurrent requests. This is one way to do it.
Upfront I should state that
This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers
units test. More specifically the files available under
https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as
https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the
smallest one.
sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].
size := sizes last.
concurrency := 11.
step := size // concurrency.
ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].
chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.
[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.
This takes about 2 seconds total for me.
[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.
Which is roughly similar to the single request (again, for me).
Two things to note: connection time dominates, in the parallel case, 11
independent requests were executed, so concurrency is definitively
happening.
The largest size file is just 100k, split in about 10 parts, which is most
probably not enough to see much effect from doing things in parallel.
HTH,
Sven
On 19 Oct 2021, at 00:17, Yanni Chiu yannix7db@gmail.com wrote:
A good use case is when one of the downloads fails. When it’s just one big one then you have start over from the beginning.
Yes, that is a good use case for the Range feature.
It is possible to configure ZnClient to retry when a request fails, for example:
client numberOfRetries: 3; retryDelay: 2 "seconds".
could be added to the example.
On Mon, Oct 18, 2021 at 11:05 AM Sven Van Caekenberghe sven@stfx.eu wrote:
Hi,
Somebody asked how you would split single HTTP Request into multiple concurrent requests. This is one way to do it.
Upfront I should state that
This one is based on the data used in the ZnHTTPSTest(s)>>#testTransfers units test. More specifically the files available under https://s3-eu-west-1.amazonaws.com/public-stfx-eu/ such as https://s3-eu-west-1.amazonaws.com/public-stfx-eu/test-2050.txt for the smallest one.
sizes := (Integer primesUpTo: 100) collect: [ :each | 1024 * each + each ].
size := sizes last.
concurrency := 11.
step := size // concurrency.
ranges := (0 to: size - 1 by: step) collect: [ :each |
{ each. (each + step) min: size } ].
chunks := Array new: ranges size.
done := Semaphore new.
ms := 0.
[
ms := Time millisecondClockValue.
ranges withIndexDo: [ :range :index |
[ | client |
(client := ZnClient new)
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu'.
client addPath: ('test-{1}.txt' format: { size }).
client headerAt: #Range put: ('bytes={1}-{2}' format: range).
client get.
client close.
chunks at: index put: client contents.
done signal ] forkAt: Processor lowIOPriority ].
ranges size timesRepeat: [ done wait ].
ms := Time millisecondsSince: ms.
(String empty join: chunks) inspect.
] fork.
This takes about 2 seconds total for me.
[
ZnClient new
https;
host: 's3-eu-west-1.amazonaws.com';
addPath: 'public-stfx-eu';
addPath: 'test-99425.txt';
get.
] timeToRun.
Which is roughly similar to the single request (again, for me).
Two things to note: connection time dominates, in the parallel case, 11 independent requests were executed, so concurrency is definitively happening.
The largest size file is just 100k, split in about 10 parts, which is most probably not enough to see much effect from doing things in parallel.
HTH,
Sven