amazon web services - Redshift MERGE statement with concurrent transaction - Stack Overflow

IT技术

更新时间：2025-03-083

admin管理员组
文章数量:1289836

We have a lambda writing to Redshift through Redshift Data API, executing a MERGE (udpate/insert) statement with a simple merge condition (two integer columns compare).

We have only one lambda running at a given time (using a fifo queue), so it guarantees there are no competing transactions, usually.

However, quite rarely, but we encounter a situation when the merge statement takes minutes to execute (comparing to more usual few seconds). It causes Lambda to time-out and reprocess the same message again.

In this case, something unexpected happens: we end up with two identical rows inserted to the table by two identical competing MERGE statements. Of course, it's undesirable, especially given PK/uniqueness constraints are not enforced by Redshift. Also, it causes future merges to fail because it finds multiple matches.

The isolation level is set to SNAPSHOT for the DB.

I'm not sure how SNAPSHOT isolation works in Redshift, but I expect it to be similar to SQL Server:

Snapshot isolation uses an optimistic concurrency model. If a snapshot transaction attempts to commit modifications to data that has changed since the transaction began, the transaction will roll back and an error will be raised.

I have the following questions:

Does Redshift's SNAPSHOT isolation level works the same as SQL Server SNAPSHOT, namely the transaction is rolled back if underlying data is modified?
If the answer to the above is "yes", why do we get two identical rows? Is it connected to the fact that the MERGE is rewritten to INSERT by Redshift and considered independent?
Will switching to SERIALIZABLE level mitigate this issue? if not, what's the best way to handle it?

We have a lambda writing to Redshift through Redshift Data API, executing a MERGE (udpate/insert) statement with a simple merge condition (two integer columns compare).

We have only one lambda running at a given time (using a fifo queue), so it guarantees there are no competing transactions, usually.

However, quite rarely, but we encounter a situation when the merge statement takes minutes to execute (comparing to more usual few seconds). It causes Lambda to time-out and reprocess the same message again.

In this case, something unexpected happens: we end up with two identical rows inserted to the table by two identical competing MERGE statements. Of course, it's undesirable, especially given PK/uniqueness constraints are not enforced by Redshift. Also, it causes future merges to fail because it finds multiple matches.

The isolation level is set to SNAPSHOT for the DB.

I'm not sure how SNAPSHOT isolation works in Redshift, but I expect it to be similar to SQL Server:

Snapshot isolation uses an optimistic concurrency model. If a snapshot transaction attempts to commit modifications to data that has changed since the transaction began, the transaction will roll back and an error will be raised.

I have the following questions:

Does Redshift's SNAPSHOT isolation level works the same as SQL Server SNAPSHOT, namely the transaction is rolled back if underlying data is modified?
If the answer to the above is "yes", why do we get two identical rows? Is it connected to the fact that the MERGE is rewritten to INSERT by Redshift and considered independent?
Will switching to SERIALIZABLE level mitigate this issue? if not, what's the best way to handle it?

Share Improve this question asked Feb 20 at 22:10 Alex Sikilinda 3,01320 silver badges34 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

After some testing, it's indeed the SNAPSHOT transaction isolation level which can cause duplicated records in the same table when using MERGE.

This is the code to demonstrate the behavior:

CREATE OR REPLACE FUNCTION janky_sleep (x float) RETURNS bool IMMUTABLE as $$
    from time import sleep
    sleep(x)
    return True
$$ LANGUAGE plpythonu;

select janky_sleep(5.0);


create table testingConcurrentMerge (
id integer,
text varchar(10)
);


begin transaction;
    
    create table #tmerge (id integer,
        text varchar(10)
    );
   
    insert into #tmerge values (1, 'test'), (2, 'test2');
   
    merge into testingConcurrentMerge
    USING #tmerge t
    on t.id = testingConcurrentMerge.id
    WHEN MATCHED THEN
    update SET text = t.text
    WHEN NOT MATCHED THEN 
    INSERT (id, text)
    values (t.id, t.text);
    
    select janky_sleep(10.0);

end transaction;

Executing the transaction from two multiple connection causes the data to be duplicated.

Changing the isolation level to SEIALIZABLE forces the second transaction to fail, so the records are not duplicated.

本文标签： amazon web servicesRedshift MERGE statement with concurrent transactionStack Overflow

版权声明：本文标题：amazon web services - Redshift MERGE statement with concurrent transaction - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741404982a2376891.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

amazon web services - Redshift MERGE statement with concurrent transaction - Stack Overflow

1 Answer 1

更多相关文章