admin管理员组

文章数量:1353119

I am trying to create a DLT pipeline and this is my first time doing it. What is the best possible way to satisfy my following requirements. I am fully aware that the path I am choosing may not be optimal and I am open to design recommendations here as well. Here's what I am trying to do:

@dlt.table(
    name="bronze_dlt_table",
    comment="This table reads data from a Delta location",

    table_properties={
        "quality": "bronze"
    }
)
def read_raw_bronze_dlt_table():
   return spark.read.format("delta").load("Delta Table Path written from Upstream location")



@dlt.table(
    name="silver_dlt_table",
    partition_cols=["ABC"],
    table_properties={
        "quality": "silver"
    })

def refresh_silver_dlt_table():
    
    bronzeDF = dlt.read("bronze_dlt_table")
    LookupDF = spark.read("Read data from a delta table")

    //Perform some basic column manipulation and joins between BronzeDF & LookupDF

    silverDF 
    
    dlt.apply_changes(
     target = silver_dlt_table,
     source = silverDF, 
     sequence_by = col("Newly Added Column in SilverDF based on LookupDF")
     )



    return 

本文标签: