admin管理员组

文章数量:1287931

I'm working in an environment where I expect several containers to be running queries to "claim" a database row. The multiple containers are for redundancy, but I only ever want one container to succeed. There's an obvious contention / race condition here, and I am currently thinking about solving this via transactions or row locking.

I am using Postgres 15. I am currently trying this query:

WITH find_unprocessed_request AS (
    SELECT
        id
    FROM
        request
    WHERE
        handler_server IS NULL
        AND status = :old_status
    ORDER BY
        created_at ASC
    LIMIT 1
    FOR UPDATE
)
UPDATE request
SET
    status = :new_status,
    handler_server = :handler_server
FROM
    find_unprocessed_request
WHERE
   request.id = find_unprocessed_request.id

You can see that a row is found if it has an old status and there's no handler server, and I want a server only to write its host-name in such a record if it is NULL.

I based my query on the question relating to this answer, though I added the FOR UPDATE per the answer (my non-expert reading of that is that I do need this clause).

However, having read this answer I wonder if I need a multi-statement transaction to be certain to avoid race conditions. I will take that approach if I have to, but I mildly would like to avoid it, since my driver (PDO/PHP) seems to be saying that multi-statement executions cannot also use parameter binding.

Update

I think this answer is also agreeing with the basic correctness of adding FOR UPDATE, though the solution here is UPDATE WHERE using a sub-select instead of a CTE. I'd be happy to do that instead if it is regarded as safe, since it's just a single query, and should work fine with my driver.

I'm working in an environment where I expect several containers to be running queries to "claim" a database row. The multiple containers are for redundancy, but I only ever want one container to succeed. There's an obvious contention / race condition here, and I am currently thinking about solving this via transactions or row locking.

I am using Postgres 15. I am currently trying this query:

WITH find_unprocessed_request AS (
    SELECT
        id
    FROM
        request
    WHERE
        handler_server IS NULL
        AND status = :old_status
    ORDER BY
        created_at ASC
    LIMIT 1
    FOR UPDATE
)
UPDATE request
SET
    status = :new_status,
    handler_server = :handler_server
FROM
    find_unprocessed_request
WHERE
   request.id = find_unprocessed_request.id

You can see that a row is found if it has an old status and there's no handler server, and I want a server only to write its host-name in such a record if it is NULL.

I based my query on the question relating to this answer, though I added the FOR UPDATE per the answer (my non-expert reading of that is that I do need this clause).

However, having read this answer I wonder if I need a multi-statement transaction to be certain to avoid race conditions. I will take that approach if I have to, but I mildly would like to avoid it, since my driver (PDO/PHP) seems to be saying that multi-statement executions cannot also use parameter binding.

Update

I think this answer is also agreeing with the basic correctness of adding FOR UPDATE, though the solution here is UPDATE WHERE using a sub-select instead of a CTE. I'd be happy to do that instead if it is regarded as safe, since it's just a single query, and should work fine with my driver.

Share Improve this question edited Feb 24 at 0:06 Erwin Brandstetter 659k157 gold badges1.1k silver badges1.3k bronze badges asked Feb 22 at 21:25 halferhalfer 20.5k19 gold badges109 silver badges202 bronze badges 3
  • I think you want to read CTE Data Modifying Queries as to visibility of data. For one, a quick test here found that something like request.id = find_unprocessed_request.id did not work. Instead it needed to be something like request.id = (select id from find_unprocessed_request). – Adrian Klaver Commented Feb 22 at 23:09
  • Thanks @AdrianKlaver, I will read that. However, the query above does indeed work for me; what version are you testing on? I'm on Postgres 15 under Docker, and it seems to run fine. It's not in an environment that would test it for contention though (yet). – halfer Commented Feb 22 at 23:25
  • 1 It was a stupid on my part, I missed the FROM find_unprocessed_request part. It works for me when I use that form. – Adrian Klaver Commented Feb 22 at 23:37
Add a comment  | 

1 Answer 1

Reset to default 4

Your statement looks basically fine. If each transaction tries to process the next free ID (of many), then I would throw in SKIP LOCKED so that none waits on a locked row (just to give up when it comes back with handler_server IS NOT NULL after the concurrent transaction has updated it). Plus a couple other suggestions:

WITH find_unprocessed_request AS (
   SELECT id
   FROM   request
   WHERE  handler_server IS NULL
   AND    status = :old_status
   ORDER  BY id  -- ①
   LIMIT  1
   FOR    NO KEY UPDATE SKIP LOCKED  -- ②
   )
UPDATE request r
SET    status = :new_status
     , handler_server = :handler_server
FROM   find_unprocessed_request f
WHERE  r.id = f.id;

① Use columns that satisfy your ordering requirements and match an existing B-tree index. You commented that processing rows in the order of id values (a serial column) is good enough. Assuming relatively few with handler_server IS NULL, a partial index would make sense - which becomes cheaper if we don't have to include created_at:

CREATE INDEX ON request (status, id) WHERE handler_server IS NULL;

② Assuming that none of the updated columns is part of unique index, we can use a weaker (and cheaper) FOR NO KEY UPDATE lock, because the following UPDATE is also satisfied with it. Thanks to Rabban for hinting at that.
See details in the manual.

Now that we have added SKIP LOCKED, it's no longer important to make ORDER BY unambiguous - like I had suggested at first. That would matter otherwise to avoid deadlocks, and to process rows in a strictly deterministic order - often a requirement, but not in your case.

Like you found yourself, for the simple case of a single locked ID, a subquery is slightly simpler and cheaper, yet:

UPDATE request
SET    status = :new_status
     , handler_server = :handler_server
WHERE  id = (
   SELECT id
   FROM   request
   WHERE  handler_server IS NULL
   AND    status = :old_status
   ORDER  BY id
   LIMIT  1
   FOR    NO KEY UPDATE SKIP LOCKED
   );

Either way, with SKIP LOCKED you can never be certain that all rows have been processed. Irrelevant for a process that is perpetuated ad infinitum anyway. Else, read up on details in my reference answer:

  • Postgres UPDATE ... LIMIT 1

本文标签: sqlIs WITH SELECTUPDATE in Postgres safe from race conditionsStack Overflow