admin管理员组

文章数量:1356753

Question: Why is my S3 URL incorrect when using read_csv in Duckdb?

Problem Statement

I am trying to create a temporary table from an S3 file using read_csv, but the generated S3 URL is incorrect, resulting in a 404 error.

S3 Path I'm Using:

s3://dev1-uswest2-cdp001-connector-staging-area/a360-falcondev-b1f7ab43600c4d5a942c02a4064a911b/connector_type/GoogleDrive/connector/20c0e4a7-7b3d-4f21-9d06-02afa8b4fcc0/45432970-faa3-4e00-b650-d3072bf5791c/2cf47e86-8cc6-4a99-a674-8fcc3bc56b11/0fb1209a-cdce-4376-9404-0960dfcb665d/data/records_1.csv  

Code I'm Using:

public static void createTemporaryTables(
        Connection connection,
        String currentDatasetS3Path) {
    try (Statement stmt = connection.createStatement()) {
        stmt.execute("SET s3_region = 'us-west-2';");
        stmt.execute("SET s3_url_style = 'virtual-host';");
        stmt.execute("SET s3_endpoint = 's3.us-west-2.amazonaws';");

        stmt.execute("SET s3_access_key_id='<>’");
        stmt.execute("SET s3_secret_access_key='<>’");
        stmt.execute("SET s3_session_token='<>’");

        System.out.println(currentDatasetS3Path);
        try {
            stmt.execute(String.format(
                    "CREATE TEMPORARY TABLE abc AS SELECT * FROM read_csv('%s');",
                    currentDatasetS3Path
            ));
            System.out.println("done");
        } catch (SQLException e) {
            e.printStackTrace();
        }
    } catch (SQLException e) {
        throw new RuntimeException("Error creating temporary tables", e);
    }
}

Environment:

  • Duckdb jdbc version: 1.1.0
  • Region: us-west-2
  • S3 Bucket: dev1-uswest2-cdp001-connector-staging-area

Error Message:

Caused by: java.sql.SQLException: HTTP Error: Unable to connect to URL ".csv": 404 (Not Found)

Expected Behavior:

The generated HTTPS URL should be:

.csv

Actual Behavior:

The method is generating this incorrect URL instead:

.csv

The difference is that the bucket name is missing before s3.us-west-2.amazonaws.

Question:

Why is my S3 URL not correctly formatted when using read_csv, despite setting s3_url_style = 'virtual-host'? How do I ensure the bucket name is included in the generated URL?

本文标签: amazon s3Incorrect S3 URL Generation in readcsv Method of duckdbStack Overflow