admin管理员组

文章数量:1122826

While implementing S3 object streaming using the AWS SDK with Bun and Hono, I noticed that the performance was noticeably better than fetching assets directly via the S3 URLs.

Comparison of fetch times:

S3 Raw Object URL:
curl -H "Content-Type: application/octet-stream" -w "Total time: %{time_total}s\n" -o /dev/null -s .tgz

Total time: 0.365313s
AWS SDK GetCommand:
curl -H "Content-Type: application/octet-stream" -w "Total time: %{time_total}s\n" -o /dev/null -s /foo.tgz

Total time: 0.135964s
AWS SDK Code:
const command = new GetObjectCommand({
  Bucket: 'bucket',
  Key: 'foo.tgz',
});

const obj = await client.send(command);
const readableStream = obj.Body?.transformToWebStream();

return stream(c, async (_stream) => {
  c.header('Content-Type', object.ContentType);

  await _stream.pipe(readableStream);
});

As shown, the time difference is quite noticeable. The AWS SDK method (GetObjectCommand) is significantly faster than directly fetching the object via the raw S3 URL.

Does the AWS S3 SDK utilize internal caching, or does it rely on persistent HTTP connections to minimize TCP handshake overhead?

While implementing S3 object streaming using the AWS SDK with Bun and Hono, I noticed that the performance was noticeably better than fetching assets directly via the S3 URLs.

Comparison of fetch times:

S3 Raw Object URL:
curl -H "Content-Type: application/octet-stream" -w "Total time: %{time_total}s\n" -o /dev/null -s https://s3-url-some-region.com/foo.tgz

Total time: 0.365313s
AWS SDK GetCommand:
curl -H "Content-Type: application/octet-stream" -w "Total time: %{time_total}s\n" -o /dev/null -s https://app.local/foo.tgz

Total time: 0.135964s
AWS SDK Code:
const command = new GetObjectCommand({
  Bucket: 'bucket',
  Key: 'foo.tgz',
});

const obj = await client.send(command);
const readableStream = obj.Body?.transformToWebStream();

return stream(c, async (_stream) => {
  c.header('Content-Type', object.ContentType);

  await _stream.pipe(readableStream);
});

As shown, the time difference is quite noticeable. The AWS SDK method (GetObjectCommand) is significantly faster than directly fetching the object via the raw S3 URL.

Does the AWS S3 SDK utilize internal caching, or does it rely on persistent HTTP connections to minimize TCP handshake overhead?

Share Improve this question asked Nov 21, 2024 at 21:55 thelovekeshthelovekesh 1,4521 gold badge9 silver badges22 bronze badges 0
Add a comment  | 

1 Answer 1

Reset to default -1

For the caching, the AWS SDK does not inherently cache data between requests for GetObjectCommand.

For the better performance, it's due to many factors (the persistent HTTP connections is one of them), we can sit:

1 - Persistent HTTP Connections (Connection Reuse): The AWS SDK for JavaScript v3 enables HTTP keep-alive by default, allowing the reuse of TCP connections for multiple requests. This reduces the overhead associated with establishing new connections for each request, thereby improving performance.

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration.html https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-reusing-connections.html

2 - Efficient Data Streaming: The transformToWebStream() method in the AWS SDK facilitates efficient streaming of data directly from S3 to your application without buffering the entire object in memory. This streaming capability can lead to faster data retrieval compared to some clients that may not handle streaming as efficiently.

and Also, we need to consider the Latency, accessing S3 objects via their public URLs may introduce additional latency due to factors like DNS resolution and routing through public networks. In contrast, the AWS SDK communicates directly with the S3 service endpoints, potentially reducing such latencies.

I believe these are some of the reasons why SDK is much faster, but not because of the caching.

本文标签: amazon web servicesRetrieving objects using AWS SDK faster than fetching via URL over HTTPStack Overflow