admin管理员组

文章数量:1122846

Why does Query 1 return many rows while Query 2 returns zero rows?

Query 1:

SELECT distinct col_a, col_b
    FROM t
    WHERE
        col_timestamp = (SELECT MAX(col_timestamp ) FROM t)
        AND col_c = col_c

Query 2:

SELECT distinct col_a, col_b
    FROM t
    WHERE
        col_timestamp = (SELECT MAX(col_timestamp ) FROM t)
        AND col_c IN (col_c)

Why does Query 1 return many rows while Query 2 returns zero rows?

Query 1:

SELECT distinct col_a, col_b
    FROM t
    WHERE
        col_timestamp = (SELECT MAX(col_timestamp ) FROM t)
        AND col_c = col_c

Query 2:

SELECT distinct col_a, col_b
    FROM t
    WHERE
        col_timestamp = (SELECT MAX(col_timestamp ) FROM t)
        AND col_c IN (col_c)
Share Improve this question asked Nov 22, 2024 at 15:33 slothish1slothish1 1592 silver badges15 bronze badges 1
  • can you pls add some sample data ? – Koushik Roy Commented Nov 22, 2024 at 16:23
Add a comment  | 

2 Answers 2

Reset to default 0

Query 1: col_c = col_c is true for all rows that are not NULL, so this returns all rows where col_c is not NULL.

Query 2: col_c IN (col_c) attempts to check if col_c is within a list containing itself. However, due to Snowflake's scoping rules, the condition fails for all rows, resulting in zero rows returned.

This SQL is very confused, lets make some example data to show the confusion.

with t(col_a, col_b, col_c,col_timestamp) as (
    select * from values 
        (1,10,100, 1000),
        (1,10,100, 900),
        (1,10,null, 1000),
        (2,20,200, 2000),
        (2,20,200, 1900),
        (2,20,null, 2000)   
)
SELECT col_a, 
    col_b,
    col_c
FROM t
WHERE col_timestamp = (SELECT MAX(col_timestamp ) FROM t)
 and col_c = col_c

gives:

COL_A COL_B COL_C
2 20 200

but I suspect you really want to find the latest PER col_c thus the SQL would make more sense like:

with t(col_a, col_b, col_c,col_timestamp) as (
    select * from values 
        (1,10,100, 1000),
        (1,10,100, 900),
        (1,10,null, 1000),
        (2,20,200, 2000),
        (2,20,200, 1900),
        (2,20,null, 2000)   
)
SELECT col_a, 
    col_b,
    col_c
FROM t as t
WHERE t.col_timestamp = (
    SELECT MAX(tt.col_timestamp ) 
    FROM t as tt 
    WHERE t.col_c = tt.col_c
)

which gives:

But if you change your timestamp data to have duplicates like:

with t(col_a, col_b, col_c,col_timestamp) as (
    select * from values 
        (1,10,100, 1000),
        (1,10,100, 900),
        (1,10,null, 1000),
        (2,20,200, 2000),
        (2,20,200, 2000),
        (2,20,null, 2000)   
)

Now you get double 2,20 data, thus the DISTINCT clause

But really you could be doing what you intend. But the base SQL used to show this "strange behavour" in of itself, is strange.

本文标签: