admin管理员组

文章数量:1122846

I am intending to update nearly 500k rows in database. I have a python script which does this by taking 1000 as the batch size (if i give batch size more than 1000, it is restricting me stating 1000 is the maximum limit). This is a sample update query which i will be running for every 100 account id's

UPDATE <table_name>
        SET INDICATOR = 'N'
        WHERE ACCOUNT_ID IN ('456456456',.........,'123123123') AND CONVERSION = 'C2' AND INDICATOR = 'Y';

While i am running this script all of a sudden it gets stuck at cursor.executemany(sql query) .

For example, I have 100 000 data, i have 100 batches, each 1000 account ids. When i start executing the script, lets say 65 batches will be executed in no time and for 66 batch, it will get stuck at the above said line. This will not happen all the time and happens rarely.

The solution which i got is, when is gets stuck there, i go to the database and execute EXEC DBMS_STATS.GATHER_TABLE_STATS('<schema name>', '<tablename>'); this plsql procedure. After executing this command within no time all other batches will be executed.

This is my issues. Why it is getting stuck all of a sudden? What is the above command doing so that the execution is resuming? What i need to do to avoid the sudden stoppage in my code. I dont want to use the above plsql procedure.

Can you please help me understand what is happening and how do i rectify it.

I am intending to update nearly 500k rows in database. I have a python script which does this by taking 1000 as the batch size (if i give batch size more than 1000, it is restricting me stating 1000 is the maximum limit). This is a sample update query which i will be running for every 100 account id's

UPDATE <table_name>
        SET INDICATOR = 'N'
        WHERE ACCOUNT_ID IN ('456456456',.........,'123123123') AND CONVERSION = 'C2' AND INDICATOR = 'Y';

While i am running this script all of a sudden it gets stuck at cursor.executemany(sql query) .

For example, I have 100 000 data, i have 100 batches, each 1000 account ids. When i start executing the script, lets say 65 batches will be executed in no time and for 66 batch, it will get stuck at the above said line. This will not happen all the time and happens rarely.

The solution which i got is, when is gets stuck there, i go to the database and execute EXEC DBMS_STATS.GATHER_TABLE_STATS('<schema name>', '<tablename>'); this plsql procedure. After executing this command within no time all other batches will be executed.

This is my issues. Why it is getting stuck all of a sudden? What is the above command doing so that the execution is resuming? What i need to do to avoid the sudden stoppage in my code. I dont want to use the above plsql procedure.

Can you please help me understand what is happening and how do i rectify it.

Share Improve this question edited Nov 25, 2024 at 19:35 MT0 168k11 gold badges66 silver badges127 bronze badges asked Nov 23, 2024 at 4:32 Kaushik VezzuKaushik Vezzu 255 bronze badges 3
  • 1 When it's stuck, what is the session waiting for? Is it blocked, for example? (v$session.blocking_session has a value) – William Robertson Commented Nov 24, 2024 at 18:30
  • 1 Do not use the IN operator in this way. It creates a great deal of pressure on the shared pool with very large unshrable cursors, adding a lot of parse time. You're better off simply looping through each value one at a time but be sure to use bind variables - do not splice in the account_id as part of the SQL like you're doing above. Ensure that that column is indexed in the database. If the network latency is such that 500K round trips is prohibitive, you should use bulk (array) binds to insert the account_id values of interest into a temp table and then do an update or merge using that – Paul W Commented Dec 3, 2024 at 22:25
  • 1 Alternatively, consider whether Python is the correct environment for your programming task. If you are simply pulling batch data, applying programmatic logic and writing back to the same database, you should consider writing it in PL/SQL instead so the program runs within the database and avoids pointless network trips. – Paul W Commented Dec 3, 2024 at 22:27
Add a comment  | 

2 Answers 2

Reset to default 0

Try using parallelism in your update, as per hint:

Your update today:

UPDATE <table_name> SET INDICATOR = 'N' WHERE ACCOUNT_ID IN ('456456456',.........,'123123123') AND CONVERSION = 'C2' AND INDICATOR = 'Y';

Try this:

UPDATE /*+parallel(12) enable_parallel_dml */ <table_name> SET INDICATOR = 'N' WHERE ACCOUNT_ID IN ('456456456',.........,'123123123') AND CONVERSION = 'C2' AND INDICATOR = 'Y';

Take care of your environment's resources because parallelism requires CPU, please test in DEV first.

Your problem sounds like a lock issue. But regarding the general technique, with executemany() you shouldn't need the IN clause. Just bind all the values. Follow the same technique as shown in https://python-oracledb.readthedocs.io/en/latest/user_guide/batch_statement.html#batch-execution-of-sql

Untested code:

sql = """UPDATE mytab 
         SET INDICATOR = 'N' 
         WHERE ACCOUNT_ID = :1 AND CONVERSION = 'C2' AND INDICATOR = 'Y'"""
data = [
    (456456456,),
    (123123123,),
]
cursor.executemany(sql, data)

Also, upgrade from cx_Oracle to python-oracledb, see the release announcement.

本文标签: Updating rows in database through python script gets stuck suddentlyStack Overflow