admin管理员组

文章数量:1357377

I’m designing a RAW ==> BRONZE ingestion pattern in BigQuery:

  • RAW layer: source CSV/Parquet files in GCS
  • BRONZE layer: native BigQuery table for improved performance vs an external table (columnar storage, partitioning, clustering)

I want each record in my BRONZE table to include metadata about its origin, for example:

  • Source file URI or filename
  • Ingestion timestamp

I'm considering:

  • bq load the new files (either once a day or with a cloud function when a new file arrives)
  • BQ Transfer Service with Incremental mode

My questions:

  • I'm wondering if it's possible to load data into BQ using those approaches, while adding the metadata columns I want?

  • What’s the simplest, most performant pattern to load GCS files into a native BigQuery table while adding metadata columns in one workflow?

  • If using bq load, should I load into a staging table then INSERT...SELECT into the final table, using literal columns to add the metadata?

  • Is there any BigQuery-native feature (e.g., external table pseudo‑columns, ingestion-time partition pseudo‑columns) that can eliminate extra steps?

I’m designing a RAW ==> BRONZE ingestion pattern in BigQuery:

  • RAW layer: source CSV/Parquet files in GCS
  • BRONZE layer: native BigQuery table for improved performance vs an external table (columnar storage, partitioning, clustering)

I want each record in my BRONZE table to include metadata about its origin, for example:

  • Source file URI or filename
  • Ingestion timestamp

I'm considering:

  • bq load the new files (either once a day or with a cloud function when a new file arrives)
  • BQ Transfer Service with Incremental mode

My questions:

  • I'm wondering if it's possible to load data into BQ using those approaches, while adding the metadata columns I want?

  • What’s the simplest, most performant pattern to load GCS files into a native BigQuery table while adding metadata columns in one workflow?

  • If using bq load, should I load into a staging table then INSERT...SELECT into the final table, using literal columns to add the metadata?

  • Is there any BigQuery-native feature (e.g., external table pseudo‑columns, ingestion-time partition pseudo‑columns) that can eliminate extra steps?

Share Improve this question asked Mar 27 at 16:26 Etienne NeveuEtienne Neveu 12.7k9 gold badges37 silver badges59 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

My idea to this is:

bq load is the simplest and fast load, and can add metadata after loading the data after the staging table. BigQuery Transfer service is for batch processing and also can be used for metadata columns.

  • Load files from GCS into BigQuery using bq load.

  • Manually add ingestion\_time in a post-load SQL step.

  • Use --projection\_fields to capture filenames if available.

  • bq load is like copying data from a folder (GCS) into a table (BigQuery).

  • You may need to add the filename and timestamp later using a simple query.

I agree with using bq load in the staging table then Insert … select because BigQuery does not automatically capture the filename, staging table is like a sorting area where you can manually attach labels like file name or timestamp to it.

BigQuery has a built-in _PARTITIONTIME column that can track when data is loaded, this can eliminate extra steps. As BigQuery doesn't automatically store the filename in native table, you still need to manually add the filename in a query after loading.

本文标签: