admin管理员组

文章数量:1129170

2 weeks after migrating to Postgres 17.2 we started getting an error on a query that worked flawlessly for years on Postgres 14. I suspected this could be related to the configuration parameters of the database or the major version, but in theory we are using the default values that come set up in both versions. We are running a 64Gb ram postgres RDS instance and we are doing joins between tables with dozens of millions of records in a data flow. We managed to isolate the issue to a query that does 2 regular outer joins on indexed columns.

SELECT *
FROM company
LEFT OUTER JOIN company_s
  ON company.domain = company_s.domain
LEFT OUTER JOIN socials
  ON company_s.raw_domain = socials.domain

This query returns normally in under 4 minutes. But in this case it runs from 5 to 11 minutes and then produces the error

invalid DSA memory alloc request size 1811939328

It is quite odd that the flow ran without issue for the first 2 weeks then stoped working while using the same data.

This shows up in our identical staging and production environments, in both cases the instance abnormally uses all its memory (usually there are more than 10GB free when running the flows), however in staging it tends to fail after 11 minutes sometimes accompanied by SSL SYSCALL Error: EOF detected. In production it fails instead after 5 minutes with : invalid DSA memory alloc request size 1811939328

What we tried

  1. Trimming Oversized Entries
  2. Rebuilding all Indices
  3. Running analyze on all the involved tables
  4. Running on a reduced sample of only 1MM entries works, but that is not a real solution to our problem
  5. Incrementing shared_buffers to 32GB

There is a bug report with the same error as we have related to PG 17 but there is no solution info related

There are a few questions with issues regarding the same problem, mostly without answers or with answers that do not apply to our problem

2 weeks after migrating to Postgres 17.2 we started getting an error on a query that worked flawlessly for years on Postgres 14. I suspected this could be related to the configuration parameters of the database or the major version, but in theory we are using the default values that come set up in both versions. We are running a 64Gb ram postgres RDS instance and we are doing joins between tables with dozens of millions of records in a data flow. We managed to isolate the issue to a query that does 2 regular outer joins on indexed columns.

SELECT *
FROM company
LEFT OUTER JOIN company_s
  ON company.domain = company_s.domain
LEFT OUTER JOIN socials
  ON company_s.raw_domain = socials.domain

This query returns normally in under 4 minutes. But in this case it runs from 5 to 11 minutes and then produces the error

invalid DSA memory alloc request size 1811939328

It is quite odd that the flow ran without issue for the first 2 weeks then stoped working while using the same data.

This shows up in our identical staging and production environments, in both cases the instance abnormally uses all its memory (usually there are more than 10GB free when running the flows), however in staging it tends to fail after 11 minutes sometimes accompanied by SSL SYSCALL Error: EOF detected. In production it fails instead after 5 minutes with : invalid DSA memory alloc request size 1811939328

What we tried

  1. Trimming Oversized Entries
  2. Rebuilding all Indices
  3. Running analyze on all the involved tables
  4. Running on a reduced sample of only 1MM entries works, but that is not a real solution to our problem
  5. Incrementing shared_buffers to 32GB

There is a bug report with the same error as we have related to PG 17 but there is no solution info related https://www.postgresql.org/message-id/18349-83d33dd3d0c855c3%40postgresql.org

There are a few questions with issues regarding the same problem, mostly without answers or with answers that do not apply to our problem

Share Improve this question edited Jan 8 at 17:21 NicolasZ asked Jan 8 at 17:20 NicolasZNicolasZ 9834 gold badges11 silver badges26 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Following the lead of the possible bug report on PG17, we ran the queries with

SET max_parallel_workers_per_gather = 0

and that made the query return in 5 minutes without errors. Digging deeper we decided to review the work_mem, which was set by default to 4MB by RDS. We updated this value in the configuration to 64MB based loosely on this parameter guide, and it started working smoothly, returning in around 3 minutes.

SET work_mem TO '64MB';

The working explain analyze looks like this

本文标签: PostgreSQL Error invalid DSA memory alloc request size Pg 17 configurationStack Overflow