admin管理员组

文章数量:1127687

I am using delta live tables in Databricks to do some data transformation (normalizing flat file data) and I need some of the tables to have identity columns that will later serve as keys to establish relationships. I have read the documentation to add the column, but when I actually generate the table it only has null values in that column. The other columns have the data as expected.

As you can see in my code snippet, I have tried resolving this by adding functions to generate the values for that column (the two lines that are commented out) but both identity() and monotonically_increasing_id() generate errors when trying to validate the code. Running this as-is without those functions succeeds in generating the table with the dlt_id column, but it has all null values.

CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
  dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
  source_column1 STRING,
  source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS 
  WITH stream_input AS 
  (
    SELECT DISTINCT
      source_column1,
      source_column2
    FROM stream(source_catalog.bronze_schema.source_table)
  )
  SELECT
    --monotonically_increasing_id() as dlt_id,
    --identity() as dlt_id,
    source_column1,
    source_column2
  FROM stream_input;

I am coming from a SQL Server background, so I expected that defining the column as an identity would have the effect of automatically generating unique increasing integers. Based on the documentation I have read, this works in delta live tables too, but I am not seeing the expected behavior. I'd love some feedback or links to more documentation that shows what I'm doing wrong here.

Edit: I have also tried adding the optional start and increment values for the identity (START WITH 1 INCREMENT BY 1) but the column is still all null.

I am using delta live tables in Databricks to do some data transformation (normalizing flat file data) and I need some of the tables to have identity columns that will later serve as keys to establish relationships. I have read the documentation to add the column, but when I actually generate the table it only has null values in that column. The other columns have the data as expected.

As you can see in my code snippet, I have tried resolving this by adding functions to generate the values for that column (the two lines that are commented out) but both identity() and monotonically_increasing_id() generate errors when trying to validate the code. Running this as-is without those functions succeeds in generating the table with the dlt_id column, but it has all null values.

CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
  dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
  source_column1 STRING,
  source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS 
  WITH stream_input AS 
  (
    SELECT DISTINCT
      source_column1,
      source_column2
    FROM stream(source_catalog.bronze_schema.source_table)
  )
  SELECT
    --monotonically_increasing_id() as dlt_id,
    --identity() as dlt_id,
    source_column1,
    source_column2
  FROM stream_input;

I am coming from a SQL Server background, so I expected that defining the column as an identity would have the effect of automatically generating unique increasing integers. Based on the documentation I have read, this works in delta live tables too, but I am not seeing the expected behavior. I'd love some feedback or links to more documentation that shows what I'm doing wrong here.

Edit: I have also tried adding the optional start and increment values for the identity (START WITH 1 INCREMENT BY 1) but the column is still all null.

Share Improve this question edited Jan 9 at 14:03 SQLDoug asked Jan 8 at 18:13 SQLDougSQLDoug 5974 silver badges9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I have been able to reproduce this several times now, so this solution works for me. When building the delta live table in my workspace, I write the script, click Validate, then click Start. This creates a table that I can query, and I can see the null values in the identity field. Next, I go to the Delta Live Tables area under the Data Engineering header, and I do a full refresh of the table once it has been created. This adds the identity values I was expecting. So here is what my final code looks like:

CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
  dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
  source_column1 STRING,
  source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS
  SELECT DISTINCT
    source_column1,
    source_column2
  FROM stream(source_catalog.bronze_schema.source_table);

本文标签: Identity column in a Databricks delta live table has all null valuesStack Overflow