admin管理员组

文章数量:1122832

I am selecting around 27 columns, so I broke the code into CTEs for this query as there are some sub/inner queries

It will usually be stuff like, user id, match the primary id key in this or that table; take the term in this, match the term in that; use this security group; take this category and subcategory, match it in that table - some tables don't have the id keys; given that we want information for this or that month, find the effective time that we started that the most recent contract for that row; find a primary contract for an employee that supersedes the rest..

columns are, i think in theory, indexed, when i set the tables in a diagram window on PL/SQL Developer I can generate documentation and see which columns are indexed, so that I can join tables on those columns and not lose too much performance. But I'm probably dumb and made a mistake somewhere.

But anyway, the point is my code would look like a long chain of interrelated CTEs:

with CTE01 as (
  -- employee row
),
CTE04 as (
  -- info row for employee in this section 
),
CTE04_SUM as (
 -- employee hours in this section
),
CTE10 as (
  -- row with different information but same employee
),
CTE10_Sum as ( 
  -- employee hours in that section
),
-- Combine the CTE04 and CTE10 information into a single total row. right now my code is bugged so that i perform a full outer join from CTE04 on CTE10, but not the other way around - this results in me losing the total row from CTE10 when the same id is not present in CTE04

Quarter_Total as (
  -- employee hours combined
) -- combines CTE10_Sum and CTE04_Sum into a single row


-- Year Total Row summing all quarters of the previous CTE
YEAR_TOTAL as (
  --total employee hours in year
)

-- get cte04 rows, union cte04 Sum Rows, union cte10 rows, union cte10 Sum Row, union Quarter Total Rows, union Year Total Rows

SELECT FROM CTE04
union
select from cte04sum
union
select from cte10
union 
select from cte10sum
union 
select from quarter
union
select from year

Now, I have a known "good" configuration, but when I try to debug the execution time by selecting from a single CTE directly (Let's say CTE04), it's about 15 seconds. For the sum row, that number jumps even higher. Some CTEs take an agonizing 100-200 seconds (for me, at least. I know some people are waiting 30 minutes...) Yet if I just run it as the final output, it's 5 or 6 seconds. And the final output is ordered, whereas the test output from a problematic CTE may not be ordered at all and still experience poor performance.

There are <2k rows in these CTEs. I guess the Oracle SQL optimizer is doing sometihng to make the final output look good

Sorry if this has an obvious answer or has already been answered, I am new to SQL and didn't really understand what I was reading or doing, but it's frustrating me a lot becuase I can't figure out why the code is so slow sometimes and so fast others. I've tried moving one of the column from a CTE that was doing a single select in a where condition into its own CTE and joining on that and that helped, but I just don't understand how that really helped, since SQL is supposed to be declarative, doesn't the optimizer decide what to do on its own?

I am selecting around 27 columns, so I broke the code into CTEs for this query as there are some sub/inner queries

It will usually be stuff like, user id, match the primary id key in this or that table; take the term in this, match the term in that; use this security group; take this category and subcategory, match it in that table - some tables don't have the id keys; given that we want information for this or that month, find the effective time that we started that the most recent contract for that row; find a primary contract for an employee that supersedes the rest..

columns are, i think in theory, indexed, when i set the tables in a diagram window on PL/SQL Developer I can generate documentation and see which columns are indexed, so that I can join tables on those columns and not lose too much performance. But I'm probably dumb and made a mistake somewhere.

But anyway, the point is my code would look like a long chain of interrelated CTEs:

with CTE01 as (
  -- employee row
),
CTE04 as (
  -- info row for employee in this section 
),
CTE04_SUM as (
 -- employee hours in this section
),
CTE10 as (
  -- row with different information but same employee
),
CTE10_Sum as ( 
  -- employee hours in that section
),
-- Combine the CTE04 and CTE10 information into a single total row. right now my code is bugged so that i perform a full outer join from CTE04 on CTE10, but not the other way around - this results in me losing the total row from CTE10 when the same id is not present in CTE04

Quarter_Total as (
  -- employee hours combined
) -- combines CTE10_Sum and CTE04_Sum into a single row


-- Year Total Row summing all quarters of the previous CTE
YEAR_TOTAL as (
  --total employee hours in year
)

-- get cte04 rows, union cte04 Sum Rows, union cte10 rows, union cte10 Sum Row, union Quarter Total Rows, union Year Total Rows

SELECT FROM CTE04
union
select from cte04sum
union
select from cte10
union 
select from cte10sum
union 
select from quarter
union
select from year

Now, I have a known "good" configuration, but when I try to debug the execution time by selecting from a single CTE directly (Let's say CTE04), it's about 15 seconds. For the sum row, that number jumps even higher. Some CTEs take an agonizing 100-200 seconds (for me, at least. I know some people are waiting 30 minutes...) Yet if I just run it as the final output, it's 5 or 6 seconds. And the final output is ordered, whereas the test output from a problematic CTE may not be ordered at all and still experience poor performance.

There are <2k rows in these CTEs. I guess the Oracle SQL optimizer is doing sometihng to make the final output look good

Sorry if this has an obvious answer or has already been answered, I am new to SQL and didn't really understand what I was reading or doing, but it's frustrating me a lot becuase I can't figure out why the code is so slow sometimes and so fast others. I've tried moving one of the column from a CTE that was doing a single select in a where condition into its own CTE and joining on that and that helped, but I just don't understand how that really helped, since SQL is supposed to be declarative, doesn't the optimizer decide what to do on its own?

Share Improve this question edited Nov 25, 2024 at 8:19 jps 22.3k16 gold badges87 silver badges102 bronze badges asked Nov 23, 2024 at 3:12 zitotzitot 314 bronze badges 1
  • Think of CTEs as views and it makes it simpler. You wouldn't expect SELECT * FROM view WHERE x = 123 to take as long as SELECT * FROM view. Or, if you have a programming background, think if them as macros; they get substituted / expanded in to your final query before being compiled in to an execution plan. – MatBailie Commented Nov 24, 2024 at 16:24
Add a comment  | 

1 Answer 1

Reset to default 1

If I understand your question correctly, you want to know ho it is possible for single CTEs in your query to take longer than the complete query including these CTEs.

This can indeed happen. As you say, SQL is a declarative language. We tell the DBMS what to look for, but the DBMS decides how to do this. CTEs are for readability. The DBMS won't necessarily run one CTE after the other, but combine them and see how to best get to the data.

Let's look at a small, stupid example.

  • In CTE1 we are looking for products costing more than 1$. This may take a little, when there is no index available or when the DBMS fears that too many rows may match the criteria so running through an index won't make sense.
  • In CTE2 we do the same for products costing less than 1$. Again, this may take a while.
  • Finally we inner join the two data sets.

The query:

with
  cte1 as
  (
    select product_id, name
    from products
    where price > 1.00
  ),
  cte2 as
  (
    select product_id, category
    from products
    where price < 1.00
  )
select cte1.name, cte2.category
from ct1 join ct2 using (product_id);

If the DBMS ran one CTE after the other and then joined the data sets, the complete query would of course take longer than the single CTEs.

But the DBMS will probably notice that it can boil down the final to

select name, category
from products 
where price > 1.00 and price < 1.00;

So, if it now runs through the table again, it can dismiss row after row instead of keeping them for a final join. This may already be much quicker than a single CTE were it has to collect data.

But if the DBMS even notices that the criteria is contradictory, then it can return the zero rows right away without even looking at the table.

So, yes, the complete query can be much faster than its seperate CTEs, because the DBMS doesn't have to execute the single CTEs in order to get the final result.

本文标签: sqlSelecting from CTE directly is 20x slower than final output unionStack Overflow