admin管理员组

文章数量:1391860

In Snowflake I'm trying to de-identify data stored in a JSON column based on configurations in a separate table. I created a custom SQL function DEIDENTIFY_JSON which uses a subquery to look up the de-identification treatment and applies it to an input string.

CREATE OR REPLACE TABLE JSON_DATA_TABLE (
    ID INTEGER,
    JSON_DATA STRING
);

INSERT INTO JSON_DATA_TABLE (ID, JSON_DATA)
VALUES
    (1, '{"SSN": "111-22-3333", "Name": "Alice Brown", "DOB": "1988-07-22", "YearJoined": "2015-01-15"}'),
    (2, '{"SSN": "444-55-6666", "Name": "Mike Wilson", "DOB": "1995-03-10", "YearJoined": "2019-06-20"}'),
    (3, '{"SSN": "777-88-9999", "Name": "Emma Davis", "DOB": "1975-11-30", "YearJoined": "2021-09-01"}');

This configuration table stores the de-identification rules for each JSON attribute:

CREATE OR REPLACE TABLE dt_salt_deid (
    database_name VARCHAR,
    schema_name VARCHAR,
    table_name VARCHAR,
    attribute_name VARCHAR,
    treatment VARCHAR,
    salt VARCHAR
);

INSERT INTO dt_salt_deid  VALUES
    ('SONAR_RAW', 'RWE_TEST', 'JSON_DATA_TABLE', 'SSN', 'SHA256', 'SecretSalt123'),
    ('SONAR_RAW', 'RWE_TEST', 'JSON_DATA_TABLE', 'NAME', 'Redact', NULL),
    ('SONAR_RAW', 'RWE_TEST', 'JSON_DATA_TABLE', 'DOB', 'MMYYYY', NULL);

This de-identification function takes database, schema, table names, and an input string, then applies the treatment for the 'NAME' attribute (hardcoded):

CREATE OR REPLACE FUNCTION DEIDENTIFY(
    "TREATMENT" VARCHAR,
    "SALT" VARCHAR,
    "INPUT_STRING" VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
COMMENT='Returns de-identified data using SHA256, REDACT, or GENERALISE'
AS '
SELECT CASE 
    WHEN INPUT_STRING IS NULL THEN NULL
    WHEN TREATMENT = ''SHA256'' THEN SHA2(CONCAT(UPPER(INPUT_STRING), SALT), 256)
    WHEN TREATMENT = ''Redact'' THEN NULL
    WHEN TREATMENT = ''MMYYYY'' THEN TO_CHAR(TO_DATE(INPUT_STRING), ''MMYYYY'')
    WHEN TREATMENT = ''YYYY'' THEN TO_CHAR(TO_DATE(INPUT_STRING), ''YYYY'')
    ELSE INPUT_STRING
END
';
CREATE OR REPLACE FUNCTION DEIDENTIFY_JSON(
    DATABASE_NAME VARCHAR,
    SCHEMA_NAME VARCHAR,
    TABLE_NAME VARCHAR,
    INPUT_STRING VARCHAR
)
RETURNS STRING
LANGUAGE SQL
AS
$$
WITH data AS 
(
    SELECT INPUT_STRING AS settings_string
),
key_value_pair AS 
(
    SELECT
        LTRIM(SPLIT_PART(pair.value, '=', 1), '"') AS key,
        RTRIM(SPLIT_PART(pair.value, '=', 2), '"') AS value
    FROM
        data,
        LATERAL FLATTEN(INPUT => SPLIT(settings_string, ',')) AS pair
),
deidentified_data AS (
    SELECT
        kvp.key,
        kvp.value,
        DEIDENTIFY(
            COALESCE(d.treatment, 'NONE'),
            COALESCE(d.salt, ''),
            kvp.value
        )::VARIANT AS deidentified_value
    FROM key_value_pair kvp
    LEFT JOIN SONAR_RAW.RWE_TEST.sonar_dt_salt_deid d
        ON d.database_name = DATABASE_NAME
        AND d.schema_name = SCHEMA_NAME
        AND d.table_name = TABLE_NAME
        AND d.attribute_name = kvp.key
),
aggregated AS (
    SELECT
        OBJECT_AGG(key, deidentified_value) AS aggregated_json
    FROM deidentified_data
)
SELECT 
    TO_JSON(aggregated_json)::STRING
FROM aggregated
$$;

I’m trying to apply this function to the JSON_DATA column:

SELECT
    DEIDENTIFY_JSON('SONAR_RAW', 'RWE_TEST', 'JSON_DATA_TABLE', JSON_DATA)
FROM JSON_DATA_TABLE;

When I run this I get the error:

Unsupported subquery type cannot be evaluated.

I tested the function with a simple string (e.g., 'NAME=77') and it works:

SELECT DEIDENTIFY_JSON('SONAR_RAW', 'RWE_TEST', 'JSON_DATA_TABLE', 'NAME=77');

This returns 'NULL' as expected since the treatment for 'NAME' is 'Redact'.

However, when I pass the entire JSON_DATA column (a JSON string), it fails with the error.

The subquery inside the function might not be supported in this context (e.g., when called row-by-row in a SELECT).

My goal is to de-identify each JSON attribute (SSN, Name, DOB) individually based on the config table and reconstruct the JSON.

Why am I getting the error:

Unsupported subquery type cannot be evaluated

How can I fix this function or rewrite my approach to de-identify each JSON attribute according to the configurations and return a new JSON object?

本文标签: