admin管理员组

文章数量:1352863

I am trying to get the number of files and sum of their sizes in the external stage in snowflake so that I can assign appropriate warehouse while doing copy operation. I could get number of files using

count(distinct METADATA$FILENAME)

select count(distinct METADATA$FILENAME)  from @STAGE_NAME
where metadata$file_last_modified > (select max(end_time) from <table_name>);

But I am not able to get the size of the files by querying the stage files. I see that LIST command gives me the size of the files, but I am not sure how to use it in conjunction with the above query. Can someone help me?

I am trying to get the number of files and sum of their sizes in the external stage in snowflake so that I can assign appropriate warehouse while doing copy operation. I could get number of files using

count(distinct METADATA$FILENAME)

select count(distinct METADATA$FILENAME)  from @STAGE_NAME
where metadata$file_last_modified > (select max(end_time) from <table_name>);

But I am not able to get the size of the files by querying the stage files. I see that LIST command gives me the size of the files, but I am not sure how to use it in conjunction with the above query. Can someone help me?

Share Improve this question edited Apr 1 at 20:26 samhita 4,0802 gold badges11 silver badges18 bronze badges asked Apr 1 at 19:28 Tester_CaryTester_Cary 513 silver badges10 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Snowflake external stage metadata does not directly provide the size information.

Alternative approach:

You can use LIST to get the size information like you mentioned.

LIST @mystage

returns

  • size
  • last_modified

Note: last_modified is in Varchar, but you want to do date comparison. so here is a UDF to convert the last_modified to a timestamp which can be used to compare date or timestamp.Sample SO answer for the UDF.

create or replace function LAST_MODIFIED_TO_TIMESTAMP(LAST_MODIFIED string) 
returns timestamp_tz
as
$$
    to_timestamp_tz(left(LAST_MODIFIED, len(LAST_MODIFIED) - 4) || ' ' || '00:00', 'DY, DD MON YYYY HH:MI:SS TZH:TZM')
$$;

Sample query:

LIST @my_internal_stage;

SELECT $1 AS file_name,count(DISTINCT $1) AS count_files, SUM(DISTINCT $2 ) AS size
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE LAST_MODIFIED_TO_TIMESTAMP("last_modified") > '2025-01-01'
GROUP  BY file_name ;

Make sure to run SELECT just after the LIST, since LAST_QUERY_ID uses the last executed query and result_scan returns its resultset.

And you can use distinct on count and sum for the files

Above returns

本文标签: sqlGet size of staged files from an external stage in snowflakeStack Overflow