admin管理员组文章数量:1122846
I have this data:
select * from (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
)
And, in words, I need: The jobs constituting the (top) 60% of errors
So, in this case, that would be (113):
select sum(errors) * .4 as cut_off from ...
The final results would be these, because their sum < 113:
JOB | ERRORS |
---|---|
G | 75 |
E | 35 |
I have this data:
select * from (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
)
And, in words, I need: The jobs constituting the (top) 60% of errors
So, in this case, that would be (113):
select sum(errors) * .4 as cut_off from ...
The final results would be these, because their sum < 113:
JOB | ERRORS |
---|---|
G | 75 |
E | 35 |
I basically need a filter that keeps some sort of running sum, and then throws away everything once it hits that value.
I have this query, which doesn't quite work and I would prefer not use the with
statement
with data as (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
)
select k.*
from (
select t.*,
errors + LAG(errors, 1, 0) OVER (order by errors desc ) previous
from data t
) k where previous >= (select sum(errors) *.4 from data) order by errors desc
And I have tried windowed sum:
select k.*
from (
select t.*,
SUM(errors) OVER (
partition by JOB
order by errors desc
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as limit
from (
select 'A' as JOB, 15 as errors from dual union all
select 'B' as JOB, 17 as errors from dual union all
select 'C' as JOB, 29 as errors from dual union all
select 'D' as JOB, 27 as errors from dual union all
select 'E' as JOB, 35 as errors from dual union all
select 'F' as JOB, 32 as errors from dual union all
select 'G' as JOB, 75 as errors from dual union all
select 'H' as JOB, 31 as errors from dual union all
select 'I' as JOB, 12 as errors from dual union all
select 'J' as JOB, 10 as errors from dual
) t
) k order by errors desc
Share
Improve this question
edited Nov 25, 2024 at 8:20
jps
22.3k16 gold badges87 silver badges102 bronze badges
asked Nov 22, 2024 at 22:01
Christian BongiornoChristian Bongiorno
5,6464 gold badges44 silver badges97 bronze badges
4
- Now why 113? The total sum of your data set (errors) is 283, 60% of it would be 169.8? Maybe a cumulative ranked sum of your data could give you the results you want? Something like this: sqlfiddle.com/oracle/… – Jorge Campos Commented Nov 22, 2024 at 22:31
- Did I add wrong? – Christian Bongiorno Commented Nov 22, 2024 at 22:57
- Yes, you did. Total on your data sample is 283. – Jorge Campos Commented Nov 25, 2024 at 16:22
- 1 113 is the total the is 60% of the sum of all data: 283 * .4 = 113. It's *.4 because it has to represent "60% of all errors", so everything above 40% – Christian Bongiorno Commented Nov 26, 2024 at 22:12
2 Answers
Reset to default 2SUM(errors) OVER (ORDER BY errors DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
calculates running total of errors in descending order.WHERE cum_errors <= cut_off
filters jobs with cumulative errors under 40% of total.
SELECT job, errors
FROM (
SELECT job, errors,
SUM(errors) OVER (ORDER BY errors DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_errors,
SUM(errors) OVER () * 0.4 AS cut_off
FROM (
SELECT 'A' AS job, 15 AS errors FROM DUAL UNION ALL
SELECT 'B' AS job, 17 AS errors FROM DUAL UNION ALL
SELECT 'C' AS job, 29 AS errors FROM DUAL UNION ALL
SELECT 'D' AS job, 27 AS errors FROM DUAL UNION ALL
SELECT 'E' AS job, 35 AS errors FROM DUAL UNION ALL
SELECT 'F' AS job, 32 AS errors FROM DUAL UNION ALL
SELECT 'G' AS job, 75 AS errors FROM DUAL UNION ALL
SELECT 'H' AS job, 31 AS errors FROM DUAL UNION ALL
SELECT 'I' AS job, 12 AS errors FROM DUAL UNION ALL
SELECT 'J' AS job, 10 AS errors FROM DUAL
) t
)
WHERE cum_errors <= cut_off
ORDER BY errors DESC;
Output:
JOB | ERRORS |
---|---|
G | 75 |
E | 35 |
fiddle
Another logic using subquery and self-join.
total_errors_cte:
This CTE calculates the total number of errors from the jobs_errors table. It simply sums up all the errors in the table.
running_sum_cte:
This CTE calculates the cumulative sum of errors in descending order. For each job, it sums up the errors of all jobs that have errors greater than or equal to the current job's error count (WHERE x.errors >= t.errors). It also retrieves the total_errors from the total_errors_cte to compare the running sum against 40% of the total errors.
WITH total_errors_cte AS (
-- Calculate the total errors for all jobs
SELECT SUM(errors) AS total_errors
FROM jobs_errors
),
running_sum_cte AS (
-- Calculate the running sum of errors, ordered by errors DESC
SELECT JOB, errors,
(SELECT total_errors FROM total_errors_cte) AS total_errors,
(
SELECT SUM(errors)
FROM jobs_errors x
WHERE x.errors >= t.errors
) AS running_sum
FROM jobs_errors t
)
-- Filter jobs whose running sum is <= 40% of total errors
SELECT JOB, errors
FROM running_sum_cte
WHERE running_sum <= total_errors * 0.4
ORDER BY errors DESC;
Output :
Fiddle
本文标签: sqlHow to get only the records who39s sum constitute gt xof the totalStack Overflow
版权声明:本文标题:sql - How to get only the records who's sum constitute > x % of the total - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736300601a1930874.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论