admin管理员组

文章数量:1122846

I have this data:

select * from (
    select 'A' as JOB, 15 as errors from dual union all
    select 'B' as JOB, 17 as errors from dual union all
    select 'C' as JOB, 29 as errors from dual union all
    select 'D' as JOB, 27 as errors from dual union all
    select 'E' as JOB, 35 as errors from dual union all
    select 'F' as JOB, 32 as errors from dual union all
    select 'G' as JOB, 75 as errors from dual union all
    select 'H' as JOB, 31 as errors from dual union all
    select 'I' as JOB, 12 as errors from dual union all
    select 'J' as JOB, 10 as errors from dual
)

And, in words, I need: The jobs constituting the (top) 60% of errors

So, in this case, that would be (113):

select sum(errors) * .4 as cut_off from ...

The final results would be these, because their sum < 113:

JOB ERRORS
G 75
E 35

I have this data:

select * from (
    select 'A' as JOB, 15 as errors from dual union all
    select 'B' as JOB, 17 as errors from dual union all
    select 'C' as JOB, 29 as errors from dual union all
    select 'D' as JOB, 27 as errors from dual union all
    select 'E' as JOB, 35 as errors from dual union all
    select 'F' as JOB, 32 as errors from dual union all
    select 'G' as JOB, 75 as errors from dual union all
    select 'H' as JOB, 31 as errors from dual union all
    select 'I' as JOB, 12 as errors from dual union all
    select 'J' as JOB, 10 as errors from dual
)

And, in words, I need: The jobs constituting the (top) 60% of errors

So, in this case, that would be (113):

select sum(errors) * .4 as cut_off from ...

The final results would be these, because their sum < 113:

JOB ERRORS
G 75
E 35

I basically need a filter that keeps some sort of running sum, and then throws away everything once it hits that value.

I have this query, which doesn't quite work and I would prefer not use the with statement

with data as (
    select 'A' as JOB, 15 as errors from dual union all
        select 'B' as JOB, 17 as errors from dual union all
        select 'C' as JOB, 29 as errors from dual union all
        select 'D' as JOB, 27 as errors from dual union all
        select 'E' as JOB, 35 as errors from dual union all
        select 'F' as JOB, 32 as errors from dual union all
        select 'G' as JOB, 75 as errors from dual union all
        select 'H' as JOB, 31 as errors from dual union all
        select 'I' as JOB, 12 as errors from dual union all
        select 'J' as JOB, 10 as errors from dual
)
select k.*
from (
    select t.*,
           errors + LAG(errors, 1, 0) OVER (order by errors desc ) previous
    from data t
) k where previous >= (select sum(errors) *.4 from data) order by errors desc

And I have tried windowed sum:

select k.*
from (
    select t.*,
           SUM(errors) OVER (
               partition by JOB
               order by errors desc
               RANGE BETWEEN UNBOUNDED PRECEDING
                AND CURRENT ROW
          ) as limit
    from (
        select 'A' as JOB, 15 as errors from dual union all
        select 'B' as JOB, 17 as errors from dual union all
        select 'C' as JOB, 29 as errors from dual union all
        select 'D' as JOB, 27 as errors from dual union all
        select 'E' as JOB, 35 as errors from dual union all
        select 'F' as JOB, 32 as errors from dual union all
        select 'G' as JOB, 75 as errors from dual union all
        select 'H' as JOB, 31 as errors from dual union all
        select 'I' as JOB, 12 as errors from dual union all
        select 'J' as JOB, 10 as errors from dual
    ) t
) k order by errors desc
Share Improve this question edited Nov 25, 2024 at 8:20 jps 22.3k16 gold badges87 silver badges102 bronze badges asked Nov 22, 2024 at 22:01 Christian BongiornoChristian Bongiorno 5,6464 gold badges44 silver badges97 bronze badges 4
  • Now why 113? The total sum of your data set (errors) is 283, 60% of it would be 169.8? Maybe a cumulative ranked sum of your data could give you the results you want? Something like this: sqlfiddle.com/oracle/… – Jorge Campos Commented Nov 22, 2024 at 22:31
  • Did I add wrong? – Christian Bongiorno Commented Nov 22, 2024 at 22:57
  • Yes, you did. Total on your data sample is 283. – Jorge Campos Commented Nov 25, 2024 at 16:22
  • 1 113 is the total the is 60% of the sum of all data: 283 * .4 = 113. It's *.4 because it has to represent "60% of all errors", so everything above 40% – Christian Bongiorno Commented Nov 26, 2024 at 22:12
Add a comment  | 

2 Answers 2

Reset to default 2
  • SUM(errors) OVER (ORDER BY errors DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) calculates running total of errors in descending order.
  • WHERE cum_errors <= cut_off filters jobs with cumulative errors under 40% of total.
SELECT job, errors
FROM (
    SELECT job, errors,
        SUM(errors) OVER (ORDER BY errors DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_errors,
        SUM(errors) OVER () * 0.4 AS cut_off
    FROM (
        SELECT 'A' AS job, 15 AS errors FROM DUAL UNION ALL
        SELECT 'B' AS job, 17 AS errors FROM DUAL UNION ALL
        SELECT 'C' AS job, 29 AS errors FROM DUAL UNION ALL
        SELECT 'D' AS job, 27 AS errors FROM DUAL UNION ALL
        SELECT 'E' AS job, 35 AS errors FROM DUAL UNION ALL
        SELECT 'F' AS job, 32 AS errors FROM DUAL UNION ALL
        SELECT 'G' AS job, 75 AS errors FROM DUAL UNION ALL
        SELECT 'H' AS job, 31 AS errors FROM DUAL UNION ALL
        SELECT 'I' AS job, 12 AS errors FROM DUAL UNION ALL
        SELECT 'J' AS job, 10 AS errors FROM DUAL
    ) t
)
WHERE cum_errors <= cut_off
ORDER BY errors DESC;

Output:

JOB ERRORS
G 75
E 35

fiddle

Another logic using subquery and self-join.

total_errors_cte:

This CTE calculates the total number of errors from the jobs_errors table. It simply sums up all the errors in the table.

running_sum_cte:

This CTE calculates the cumulative sum of errors in descending order. For each job, it sums up the errors of all jobs that have errors greater than or equal to the current job's error count (WHERE x.errors >= t.errors). It also retrieves the total_errors from the total_errors_cte to compare the running sum against 40% of the total errors.

WITH total_errors_cte AS (
    -- Calculate the total errors for all jobs
    SELECT SUM(errors) AS total_errors
    FROM jobs_errors
),
running_sum_cte AS (
    -- Calculate the running sum of errors, ordered by errors DESC
    SELECT JOB, errors, 
           (SELECT total_errors FROM total_errors_cte) AS total_errors,
           (
               SELECT SUM(errors)
               FROM jobs_errors x
               WHERE x.errors >= t.errors
           ) AS running_sum
    FROM jobs_errors t
)
-- Filter jobs whose running sum is <= 40% of total errors
SELECT JOB, errors
FROM running_sum_cte
WHERE running_sum <= total_errors * 0.4
ORDER BY errors DESC;

Output :

Fiddle

本文标签: sqlHow to get only the records who39s sum constitute gt xof the totalStack Overflow