admin管理员组

文章数量:1122826

It is well known that using ORDER BY in a subquery in PostgreSQL is not meaningful, and the DBMS will only respect the outer most ORDER BY.

However, if we include LIMIT (alongside ORDER BY) in the sub query, is this still the case?

It is well known that using ORDER BY in a subquery in PostgreSQL is not meaningful, and the DBMS will only respect the outer most ORDER BY.

However, if we include LIMIT (alongside ORDER BY) in the sub query, is this still the case?

Share Improve this question edited Nov 21, 2024 at 10:29 Joseph Young asked Nov 21, 2024 at 9:35 Joseph YoungJoseph Young 2,79514 silver badges23 bronze badges 8
  • Are you curious what ISO SQL says, or just Postgres' implementation? – Dai Commented Nov 21, 2024 at 9:37
  • Also, are you asking about LIMIT without ORDER BY - or LIMIT with ORDER BY? – Dai Commented Nov 21, 2024 at 9:37
  • 1 Have you tested it? That is quite easily testable (you could have answered it yourself I guess). – M. Deinum Commented Nov 21, 2024 at 9:38
  • 3 @M.Deinum Whether or not something is defined-behaviour in any system cannot be determined through testing, only by looking at the spec (unless the system promises to raise a warning on undefined behaviour - which I don't think Postgres does, but please correct me if I'm wrong). C.f.: nasal demons in C. – Dai Commented Nov 21, 2024 at 9:40
  • 1 Yes, a subquery with both LIMIT and ORDER BY will be impacted by the ORDER BY. – Tim Biegeleisen Commented Nov 21, 2024 at 9:41
 |  Show 3 more comments

3 Answers 3

Reset to default 2

Using LIMIT in a subquery will affect the intermediate result set generated by that subquery. Consider the following two queries:

SELECT *
FROM
(
    SELECT *
    FROM yourTable
    ORDER BY col1
) t
ORDER BY col2;

In this query, the inner ORDER BY will (or maybe will not) execute, but since there is no LIMIT, the ordering of that intermediate result has no bearing on the ordering of the final result. The order of the final result will only depend on the outer ORDER BY clause.

However, in this query:

SELECT *
FROM
(
    SELECT *
    FROM yourTable
    ORDER BY col1
    LIMIT 10        -- only keep first 10 records
) t
ORDER BY col2;

The subquery aliased as t would only return 10 records. Again, the final ordering would only depend on the outer ORDER BY clause, but there would only be 10 records in the result set, i.e. that result set might be different than the first query.

It is well known that using ORDER BY in a subquery in PostgreSQL is not meaningful, and the DBMS will only respect the outer most ORDER BY.

Like many things that are "well known", this is not true.

In Postgres, ORDER BY in a subquery does exactly what it says. The sort order just may change again in outer query levels. If you append a LIMIT right after ORDER BY on the same subquery level, the sort order takes effect immediately. (Remaining rows may still be re-ordered in outer query levels.)

It also becomes evident if you aggregate rows on the next level. The manual:

[...] supplying the input values from a sorted subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

Related:

  • Create two arrays for two fields, keeping sort order of arrays in sync (without subquery)

Or for locking rows in the right order. See:

  • SELECT FOR UPDATE subquery not respecting LIMIT clause under load

Or for any other operation where sort order is meaningful ...

The Postgres query planner is smart about this: The manual:

According to the SQL standard, the expressions in the output list should be computed before applying DISTINCT, ORDER BY, or LIMIT. This is obviously necessary when using DISTINCT, since otherwise it's not clear what values are being made distinct. However, in many cases it is convenient if output expressions are computed after ORDER BY and LIMIT; particularly if the output list contains any volatile or expensive functions. With that behavior, the order of function evaluations is more intuitive and there will not be evaluations corresponding to rows that never appear in the output. PostgreSQL will effectively evaluate output expressions after sorting and limiting, so long as those expressions are not referenced in DISTINCT, ORDER BY or GROUP BY. (As a counterexample, SELECT f(x) FROM tab ORDER BY 1 clearly must evaluate f(x) before sorting.) Output expressions that contain set-returning functions are effectively evaluated after sorting and before limiting, so that LIMIT will act to cut off the output from a set-returning function.

From the docs:

https://www.postgresql.org/docs/current/sql-select.html#SQL-LIMIT

The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.

It is even possible for repeated executions of the same LIMIT query to return different subsets of the rows of a table, if there is not an ORDER BY to enforce selection of a deterministic subset. Again, this is not a bug; determinism of the results is simply not guaranteed in such a case.

So yes using LIMIT and ORDER BY together is defined.

本文标签: Is using ORDER BY and LIMIT in a PostgreSQL subquery defined behaviourStack Overflow