admin管理员组

文章数量:1415684

The below Oracle query with REGEXP_REPLACE runs for almost 12 min. Is there a way to improve the performance? Can anyone suggest any performance tuning tips? I have added the explain plan below for reference.

  • The EDW_STG.STG2_SRC table has '27243705' records and the EDW_STG.TMP_STG2 table has '9948154' records .
SELECT
        P.UNQ_SRC_ID,
        P.UNQ_CNTRCT_ID,
        P.PLCY_CNTRCT_ID,
        P.SOR_CD,
        P.SRC_AGRMNT_INCPTN_DT,
        P.AGRMNT_INCPTN_DT,
        P.AGRMNT_NM,
        P.SRC_AGRMNT_TYP,
        P.AGRMNT_TYP,
        P.SRC_ISS_CMPNY_CD,
        P.ISS_CMPNY_CD,
        P.SRC_SECNDRY_CMPNY_CD,
        P.SECNDRY_CMPNY_CD,
        P.PLCY_PLAN_PROD_NM,
        P.SRC_STAT_CD,
        P.AGRMNT_STAT_TXT,
        P.AGRMNT_STAT_CLASS_TXT,
        P.PLCY_TYP,
        P.INET_SRC_CD,
        P.SRC_BACC_CLIENT_ACCT_ID,
        P.BACC_CLIENT_ACCT_ID,
        P.PLAN_CD,
        P.LOB_CD,
        P.PLCY_AMT,
        P.SRC_PLCY_ISS_DT,
        P.PLCY_ISS_DT,
        P.SRC_EXPIRY_DT,
        P.SRC_CEASE_DT,
        P.SRC_PLCY_TERM_DT,
        P.PLCY_TERM_DT,
        P.PLCY_ADTNL_TERM_AMT,
        P.ANTY_CY_VAL_AMT,
        P.ANTY_INIT_YR_VAL_AMT,
        P.ANTY_OPT_CD,
        P.ANTY_CUR_AMT,
        P.AVLBL_SURR_AMT,
        P.AGRMNT_CMNT_TXT,
        P.AS_OF_DT,
        P.UNQ_CNTRCT_PARTY_ID,
        P.UNQ_PARTY_ID,
        P.SRC_PARTY_TYP,
        P.PARTY_TYP,
        P.PARTY_STAT_TXT,
        P.CLIENT_TYP,
        P.SRC_CLIENT_STAT_TXT,
        MAX(
            P.CLIENT_STAT_TXT
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS CLIENT_STAT_TXT,
        P.NM_PARSE_TYP,
        P.SRC_FULL_NM,
        P.FULL_NM,
        P.UNPARSED_NM,
        P.PARSE_STAT_OUTP_TXT,
        P.SRC_PREFIX_TXT,
        P.PREFIX_TXT,
        P.SRC_FIRST_NM,
        P.FIRST_NM,
        P.SRC_MIDL_NM,
        P.MIDL_NM,
        P.SRC_LAST_NM,
        P.LAST_NM,
        P.SRC_SUFFIX_TXT,
        P.SUFFIX_TXT,
        P.PREFRD_NM,
        P.TITLE_TXT,
        P.OCPTN_TXT,
        P.CITIZENSHIP_TXT,
        P.SRC_SSN_CD,
        P.TXPYR_ID_NO,
        MAX(
            P.TXPYR_ID_NO_SOR_CD
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS TXPYR_ID_NO_SOR_CD,
        P.SRC_BIRTH_DT,
        P.BIRTH_DT,
        P.SRC_BIRTH_PLACE_NM,
        REGEXP_REPLACE(
            TRIM(
                LISTAGG(
                    P.BIRTH_PLACE_NM,
                    '; '
                ) WITHIN GROUP(ORDER BY P.BIRTH_PLACE_NM) OVER(PARTITION BY
                    P.UNQ_CNTRCT_PARTY_ID
                )
            ),
            '(^|; )([^;]*)(; \2)+',
            '\1\2'
        ) AS BIRTH_PLACE_NM,
        P.SRC_DTH_DT,
        MAX(
            P.DTH_DT
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS DTH_DT,
        P.DTH_DT_SRC_TXT,
        P.DTH_CERT_LOC,
        P.SRC_GENDER_CD,
        MIN(
            P.GENDER_CD
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS GENDER_CD,
        P.PREFRD_LANG_TXT,
        P.ALT_LANG_TXT,
        P.SRC_ENTITY_NM,
        P.ENTITY_NM,
        P.ENTITY_TYP,
        P.ENTITY_ALT_NM,
        P.ENTITY_ACRONYM_NM,
        P.ENTITY_INDUSTRY_CD,
        P.ENTITY_DESC_TXT,
        P.DUN_AND_BRADSTREET_ID,
        P.SRC_CMNT_TXT,
        REGEXP_REPLACE(
            TRIM(
                LISTAGG(
                    P.CMNT_TXT,
                    '; '
                ) WITHIN GROUP(ORDER BY P.CMNT_TXT) OVER(PARTITION BY
                    P.UNQ_CNTRCT_PARTY_ID
                )
            ),
            '(^|; )([^;]*)(; \2)+',
            '\1\2'
        ) AS CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_ROLE_ID,
        P.ROLE_ID,
        P.SRC_PARTY_ROLE_NM,
        P.PARTY_ROLE_NM,
        P.PARTY_SOR_CD,
        P.BENE_SHARE_RT,
        P.IRREVOCABLE_BENE_FLG,
        P.BENE_RELATIONSHIP_TXT,
        P.BENE_TYPE,
        P.PARTY_ROLE_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_ADDR_ID,
        P.ADDR_SRC_ID,
        P.ADDR_TYP,
        P.SRC_ATTEN_LN_TXT,
        P.ATTEN_LN_TXT,
        P.SRC_DELVRY_ADDR_TXT,
        P.DELVRY_ADDR_TXT,
        P.SRC_ADDR1_TXT,
        P.ADDR1_TXT,
        P.SRC_ADDR2_TXT,
        P.ADDR2_TXT,
        P.SRC_ADDR3_TXT,
        P.ADDR3_TXT,
        P.SRC_ADDR4_TXT,
        P.ADDR4_TXT,
        P.SRC_CITY_NM,
        P.CITY_NM,
        P.SRC_STATE_CD,
        P.STATE_CD,
        P.SRC_POSTAL_CD,
        P.POSTAL_CD,
        P.SRC_CNTRY_NM,
        P.SRC_CNTRY_LIT,
        P.CNTRY_NM,
        P.ADDR_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_EMAIL_ID,
        P.EMAIL_ADDR_SRC_ID,
        P.EMAIL_TYP,
        P.SRC_EMAIL_ADDR_TXT,
        P.EMAIL_ADDR_TXT,
        P.EMAIL_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_PH_ID,
        P.SRC_PH_NO,
        P.PH_NO,
        P.PH_TYP,
        P.PH_NO_SRC_ID,
        P.PH_CNTRY_CD,
        P.PH_AREA_CD,
        P.PH_EXCH_NO,
        P.PH_LN_NO,
        P.PH_EXT_NO,
        P.DO_NOT_CALL_FLG,
        P.CONVENIENT_TM_TXT,
        P.PH_CMNT_TXT,
        P.ISS_STATE_CD,
        S.UNQ_SRC_ID,
        S.SOR_CD,
        S.ETL_REC_STAT_IND,
        S.ETL_CHKSUM1_NO
    FROM
        EDW_STG.STG2_SRC S
        LEFT OUTER JOIN EDW_STG.TMP_STG2 P ON (P.UNQ_SRC_ID = S.UNQ_SRC_ID AND P.SOR_CD = S.SOR_CD)
    WHERE
        S.SOR_CD IN ('UL','TM')

Plan:

The below Oracle query with REGEXP_REPLACE runs for almost 12 min. Is there a way to improve the performance? Can anyone suggest any performance tuning tips? I have added the explain plan below for reference.

  • The EDW_STG.STG2_SRC table has '27243705' records and the EDW_STG.TMP_STG2 table has '9948154' records .
SELECT
        P.UNQ_SRC_ID,
        P.UNQ_CNTRCT_ID,
        P.PLCY_CNTRCT_ID,
        P.SOR_CD,
        P.SRC_AGRMNT_INCPTN_DT,
        P.AGRMNT_INCPTN_DT,
        P.AGRMNT_NM,
        P.SRC_AGRMNT_TYP,
        P.AGRMNT_TYP,
        P.SRC_ISS_CMPNY_CD,
        P.ISS_CMPNY_CD,
        P.SRC_SECNDRY_CMPNY_CD,
        P.SECNDRY_CMPNY_CD,
        P.PLCY_PLAN_PROD_NM,
        P.SRC_STAT_CD,
        P.AGRMNT_STAT_TXT,
        P.AGRMNT_STAT_CLASS_TXT,
        P.PLCY_TYP,
        P.INET_SRC_CD,
        P.SRC_BACC_CLIENT_ACCT_ID,
        P.BACC_CLIENT_ACCT_ID,
        P.PLAN_CD,
        P.LOB_CD,
        P.PLCY_AMT,
        P.SRC_PLCY_ISS_DT,
        P.PLCY_ISS_DT,
        P.SRC_EXPIRY_DT,
        P.SRC_CEASE_DT,
        P.SRC_PLCY_TERM_DT,
        P.PLCY_TERM_DT,
        P.PLCY_ADTNL_TERM_AMT,
        P.ANTY_CY_VAL_AMT,
        P.ANTY_INIT_YR_VAL_AMT,
        P.ANTY_OPT_CD,
        P.ANTY_CUR_AMT,
        P.AVLBL_SURR_AMT,
        P.AGRMNT_CMNT_TXT,
        P.AS_OF_DT,
        P.UNQ_CNTRCT_PARTY_ID,
        P.UNQ_PARTY_ID,
        P.SRC_PARTY_TYP,
        P.PARTY_TYP,
        P.PARTY_STAT_TXT,
        P.CLIENT_TYP,
        P.SRC_CLIENT_STAT_TXT,
        MAX(
            P.CLIENT_STAT_TXT
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS CLIENT_STAT_TXT,
        P.NM_PARSE_TYP,
        P.SRC_FULL_NM,
        P.FULL_NM,
        P.UNPARSED_NM,
        P.PARSE_STAT_OUTP_TXT,
        P.SRC_PREFIX_TXT,
        P.PREFIX_TXT,
        P.SRC_FIRST_NM,
        P.FIRST_NM,
        P.SRC_MIDL_NM,
        P.MIDL_NM,
        P.SRC_LAST_NM,
        P.LAST_NM,
        P.SRC_SUFFIX_TXT,
        P.SUFFIX_TXT,
        P.PREFRD_NM,
        P.TITLE_TXT,
        P.OCPTN_TXT,
        P.CITIZENSHIP_TXT,
        P.SRC_SSN_CD,
        P.TXPYR_ID_NO,
        MAX(
            P.TXPYR_ID_NO_SOR_CD
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS TXPYR_ID_NO_SOR_CD,
        P.SRC_BIRTH_DT,
        P.BIRTH_DT,
        P.SRC_BIRTH_PLACE_NM,
        REGEXP_REPLACE(
            TRIM(
                LISTAGG(
                    P.BIRTH_PLACE_NM,
                    '; '
                ) WITHIN GROUP(ORDER BY P.BIRTH_PLACE_NM) OVER(PARTITION BY
                    P.UNQ_CNTRCT_PARTY_ID
                )
            ),
            '(^|; )([^;]*)(; \2)+',
            '\1\2'
        ) AS BIRTH_PLACE_NM,
        P.SRC_DTH_DT,
        MAX(
            P.DTH_DT
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS DTH_DT,
        P.DTH_DT_SRC_TXT,
        P.DTH_CERT_LOC,
        P.SRC_GENDER_CD,
        MIN(
            P.GENDER_CD
        ) OVER(PARTITION BY
            P.UNQ_CNTRCT_PARTY_ID
        ) AS GENDER_CD,
        P.PREFRD_LANG_TXT,
        P.ALT_LANG_TXT,
        P.SRC_ENTITY_NM,
        P.ENTITY_NM,
        P.ENTITY_TYP,
        P.ENTITY_ALT_NM,
        P.ENTITY_ACRONYM_NM,
        P.ENTITY_INDUSTRY_CD,
        P.ENTITY_DESC_TXT,
        P.DUN_AND_BRADSTREET_ID,
        P.SRC_CMNT_TXT,
        REGEXP_REPLACE(
            TRIM(
                LISTAGG(
                    P.CMNT_TXT,
                    '; '
                ) WITHIN GROUP(ORDER BY P.CMNT_TXT) OVER(PARTITION BY
                    P.UNQ_CNTRCT_PARTY_ID
                )
            ),
            '(^|; )([^;]*)(; \2)+',
            '\1\2'
        ) AS CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_ROLE_ID,
        P.ROLE_ID,
        P.SRC_PARTY_ROLE_NM,
        P.PARTY_ROLE_NM,
        P.PARTY_SOR_CD,
        P.BENE_SHARE_RT,
        P.IRREVOCABLE_BENE_FLG,
        P.BENE_RELATIONSHIP_TXT,
        P.BENE_TYPE,
        P.PARTY_ROLE_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_ADDR_ID,
        P.ADDR_SRC_ID,
        P.ADDR_TYP,
        P.SRC_ATTEN_LN_TXT,
        P.ATTEN_LN_TXT,
        P.SRC_DELVRY_ADDR_TXT,
        P.DELVRY_ADDR_TXT,
        P.SRC_ADDR1_TXT,
        P.ADDR1_TXT,
        P.SRC_ADDR2_TXT,
        P.ADDR2_TXT,
        P.SRC_ADDR3_TXT,
        P.ADDR3_TXT,
        P.SRC_ADDR4_TXT,
        P.ADDR4_TXT,
        P.SRC_CITY_NM,
        P.CITY_NM,
        P.SRC_STATE_CD,
        P.STATE_CD,
        P.SRC_POSTAL_CD,
        P.POSTAL_CD,
        P.SRC_CNTRY_NM,
        P.SRC_CNTRY_LIT,
        P.CNTRY_NM,
        P.ADDR_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_EMAIL_ID,
        P.EMAIL_ADDR_SRC_ID,
        P.EMAIL_TYP,
        P.SRC_EMAIL_ADDR_TXT,
        P.EMAIL_ADDR_TXT,
        P.EMAIL_CMNT_TXT,
        P.UNQ_CNTRCT_PARTY_PH_ID,
        P.SRC_PH_NO,
        P.PH_NO,
        P.PH_TYP,
        P.PH_NO_SRC_ID,
        P.PH_CNTRY_CD,
        P.PH_AREA_CD,
        P.PH_EXCH_NO,
        P.PH_LN_NO,
        P.PH_EXT_NO,
        P.DO_NOT_CALL_FLG,
        P.CONVENIENT_TM_TXT,
        P.PH_CMNT_TXT,
        P.ISS_STATE_CD,
        S.UNQ_SRC_ID,
        S.SOR_CD,
        S.ETL_REC_STAT_IND,
        S.ETL_CHKSUM1_NO
    FROM
        EDW_STG.STG2_SRC S
        LEFT OUTER JOIN EDW_STG.TMP_STG2 P ON (P.UNQ_SRC_ID = S.UNQ_SRC_ID AND P.SOR_CD = S.SOR_CD)
    WHERE
        S.SOR_CD IN ('UL','TM')

Plan:

Share Improve this question edited Feb 5 at 12:37 agilgur5 91718 silver badges40 bronze badges asked Feb 4 at 18:12 karthikkarthik 2631 gold badge6 silver badges21 bronze badges 1
  • 1 You should tag the database version, too. I suppose that you are using some supported up-to-date version, which means probably at least Oracle 19i. In that case I find it very likely that you simply want to use LISTAGG(DISTINCT ...) as suggested in p3consulting's answer. – Thorsten Kettner Commented Feb 5 at 10:37
Add a comment  | 

2 Answers 2

Reset to default 4

Your plan looks good (hash join and full table scans are appropriate here) - the only problem is that it's a serial plan. The whole thing is being performed by a single process. I would advise you to employ some parallelism. You are crunching a lot of data and you need more CPU power than a single session is going to provide. Find out how many CPUs you have on the database server, or ask your DBA what degree of parallelism is appropriate. Folks commonly guess at 8 and then adjust from there. Then hint the query to request that parallelism:

SELECT /*+ parallel(8) */
        P.UNQ_SRC_ID,
        P.UNQ_CNTRCT_ID,
. . .

If your DBA permits a higher DOP ("degree of parallelism"), you can crank it up higher. This not only will permit the hash join and listagg aggregation and string manipulation to occur across many CPUs, but also uses more PGA memory and so potentially less temp spillover space (which incurs costly I/O), since each session is limited at a hard maximum of 1GB of PGA. The more sessions you have (which is what parallel query will give you), the more PGA your query can use before spilling to temp. So on both counts, parallelism is the primary change I'd suggest without knowing anything further about your environment and database health.

As an aside, you may also want to be sure that listagg is appropriate here. Normally we avoid collapsing multiple values into a single field, only doing so at the very top presentation layer (for human consumption) if it makes good sense. In most cases it doesn't and a different design is called for, or an adjustment to the data model. That is nearly always the case in lower-level ETL logic that is loading physical tables (which based on your table names it sounds like you might be doing). Keep your data elements in distinct columns and rows, avoiding listagg style aggregations if possible and minimizing the use of analytic/windowing functions (those with an over clause), which generally produce violations of normal form (as well as being computationally expensive). That usually means working with more than one table at different granularities (parent-child) rather than trying to do everything in one table.

If the objective of the regexp is to remove duplicates, LISTAGG supports DISTINCT from version 19c.

For older versions, you can use the DECODE(value, LAG(value) OVER(PARTITION BY... ORDER BY...), NULL, value) technique since LISTAGG skips NULL, so if the current value is the same as the previous one the DECODE replaces it by NULL...

本文标签: Performance tuning Oracle query with REGEXPREPLACE that runs for 12 minStack Overflow