postgresql - Effective JSON array search in batches, with data extraction - Stack Overflow-软件玩家

admin管理员组
文章数量:1277875

Data model

My database contains a table with unique id and JSON blob. JSON blob contains, among other fields, a list of "identifiers" that I have an index on. Identifiers are of a given type (ISIN, VALOR, etc.).
The identifier is held in a JSON subobject, with possibly a priority class (primary, or linked): {<type>: <identifier value>, <class>: true}.
By having each identifier in its own subobject, the array of identifiers can hold multiple identifiers of the same type, for example:

    {
        "identifiers": [
          {"isin": "XS1", "primary": true},
          {"valor": 1234, "linked": true},
          {"isin": "EXTRA"}
        ],
        "workflows": {},
        "slots": {}
    }

Lookup queries

Initial, single field search

Previously I had a query that provided a single array element, to be searched among all JSON blobs "identifiers" lists and return the Ids of those elements. For example the search criteria: [ {"isin": "XS1"} ] would return true for the aforementioned example.

New search rules

Now I want the priority class to filter more precisely.
Basically the conditions are as follows:

Return a record if it contains an identifier with the given type and value (of the search criteria):

and primary = true,
or linked = true but the object holds no primary identifier of that type
or one with no primary or linked class (either having no class at all, or a different class), and there exists no primary or linked identifier of the searched type in the object

Thus the search criteria [ {"isin": "EXTRA"} ] would return false for below identifiers

'{
        "identifiers": [
          {"isin": "XS1", "primary": true},
          {"valor": 1234},
          {"isin": "EXTRA", "linked":true}
        ],
        "workflows": {},
        "slots": {}
    }'

because isin and EXTRA match but there is already an isin which is primary, thus masking all other isins.

Desired query

Query plan for this single item query was good, because i could easly hardcode certain values like "type" in the query itself. Now I would like to make the query resolve a batch of identifiers the same way, i.e.:

'[
      [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
      [{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
      [{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
    ]'

The query is: Fiddle

WITH scheme_and_id AS (
    -- Convert the input JSON object into rows of key-value pairs
    SELECT jsonb_array_elements('[
      [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
      [{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
      [{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
    ]'::jsonb) as batch_json
),
     -- first build the JSON to be used to match the index
     resolve_id as (select -- select the JSON objects
                           "id",
                           (blob -> 'identifiers')    as "ids",
                           (scheme_and_id.batch_json[1] -> 'id')::text as identifier,
                           trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"') as selected_schema
                    from blobstable, scheme_and_id
                    where (blob -> 'identifiers') @>  jsonb_build_array( scheme_and_id.batch_json[0])),
     candidates as (
         SELECT
             "id",
             selected_schema  as "schema",
             identifier as "identifier",
             (single_identifier -> 'linked')::boolean as "linked",
             (single_identifier -> 'primary')::boolean as "primary",
             (single_identifier -> selected_schema)::TEXT as selected_id,
             single_identifier
         FROM resolve_id ,
              LATERAL jsonb_array_elements("ids") single_identifier
         where  (single_identifier -> selected_schema)::TEXT is not null
     )
SELECT *
from candidates
where ("primary" and identifier = selected_id)        -- first rule
   or ("linked" and identifier = selected_id and
       not exists(select "id" from candidates where "primary")) -- second rule
   or (("primary" is Null or False) and ("linked" is Null or False) and identifier = selected_id and
       not exists(select "id" from candidates where "linked")) -- third rule
;

And the explanation: I provide arrays of params like so: [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}]

{"isin": "XS12"} is the identifier to be matched in:

where (blob -> 'identifiers') @>  jsonb_build_array( scheme_and_id.batch_json[0]))

{"id": "XS12"} and {"schema" : "isin"} parameters are needed to easily match it in the candidates conditions (extracting this dynamically was challenging because the type name can be isin, valor etc, as seen here: (trim is used because somehow schema was being returned with quotation marks

(scheme_and_id.batch_json[1] -> 'id')::text as identifier,
trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"')

After those operations "resolve_id" contains the "id" and "identifiers" blob of all arrays that contain identifiers from the "scheme_and_id"

And finally in "candidates" I extract needed fields from the blob for the final "where" logic. Addition of this "candidates" ads another nested loop to the explain plan. I am looking for a way to optimize this query. I was even thinking about making or the "where" logic using @> operator, to basically check if contains "type":"value" primary - true, and not contains others, but I see no way to do it like "not contains "isin": any, primary: true

Working example: Fiddle

Basic create table with test data, expected in the repsonse are all 3 selected ids

CREATE TABLE blobstable 
(
  id integer primary key,
  blob jsonb
);

INSERT INTO blobstable 
VALUES 
(1, '{
    "identifiers": [
      {"isin": "XS1", "primary": true},
      {"valor": 1234},
      {"isin": "EXTRA"}
    ],
    "workflows": {},
    "slots": {}
}'),
(2, '{
     "identifiers": [
      {"isin": "XS3", "primary": true},
      {"valor": 456, "linked": true},
      {"cusip": "TRGYN"}
    ],
    "workflows": {},
    "slots": {}
}'),
  (3, '{
    "identifiers": [
      {"isin": "XS12", "primary": true},
      {"valor": 678},
      {"isin": "XS5"}
    ],
    "workflows": {},
    "slots": {}
}')
;

Data model

    {
        "identifiers": [
          {"isin": "XS1", "primary": true},
          {"valor": 1234, "linked": true},
          {"isin": "EXTRA"}
        ],
        "workflows": {},
        "slots": {}
    }

Lookup queries

Initial, single field search

New search rules

Now I want the priority class to filter more precisely.
Basically the conditions are as follows:

Return a record if it contains an identifier with the given type and value (of the search criteria):

and primary = true,
or linked = true but the object holds no primary identifier of that type
or one with no primary or linked class (either having no class at all, or a different class), and there exists no primary or linked identifier of the searched type in the object

Thus the search criteria [ {"isin": "EXTRA"} ] would return false for below identifiers

'{
        "identifiers": [
          {"isin": "XS1", "primary": true},
          {"valor": 1234},
          {"isin": "EXTRA", "linked":true}
        ],
        "workflows": {},
        "slots": {}
    }'

because isin and EXTRA match but there is already an isin which is primary, thus masking all other isins.

Desired query

'[
      [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
      [{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
      [{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
    ]'

The query is: Fiddle

WITH scheme_and_id AS (
    -- Convert the input JSON object into rows of key-value pairs
    SELECT jsonb_array_elements('[
      [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
      [{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
      [{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
    ]'::jsonb) as batch_json
),
     -- first build the JSON to be used to match the index
     resolve_id as (select -- select the JSON objects
                           "id",
                           (blob -> 'identifiers')    as "ids",
                           (scheme_and_id.batch_json[1] -> 'id')::text as identifier,
                           trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"') as selected_schema
                    from blobstable, scheme_and_id
                    where (blob -> 'identifiers') @>  jsonb_build_array( scheme_and_id.batch_json[0])),
     candidates as (
         SELECT
             "id",
             selected_schema  as "schema",
             identifier as "identifier",
             (single_identifier -> 'linked')::boolean as "linked",
             (single_identifier -> 'primary')::boolean as "primary",
             (single_identifier -> selected_schema)::TEXT as selected_id,
             single_identifier
         FROM resolve_id ,
              LATERAL jsonb_array_elements("ids") single_identifier
         where  (single_identifier -> selected_schema)::TEXT is not null
     )
SELECT *
from candidates
where ("primary" and identifier = selected_id)        -- first rule
   or ("linked" and identifier = selected_id and
       not exists(select "id" from candidates where "primary")) -- second rule
   or (("primary" is Null or False) and ("linked" is Null or False) and identifier = selected_id and
       not exists(select "id" from candidates where "linked")) -- third rule
;

And the explanation: I provide arrays of params like so: [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}]

{"isin": "XS12"} is the identifier to be matched in:

where (blob -> 'identifiers') @>  jsonb_build_array( scheme_and_id.batch_json[0]))

(scheme_and_id.batch_json[1] -> 'id')::text as identifier,
trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"')

After those operations "resolve_id" contains the "id" and "identifiers" blob of all arrays that contain identifiers from the "scheme_and_id"

Working example: Fiddle

Basic create table with test data, expected in the repsonse are all 3 selected ids

CREATE TABLE blobstable 
(
  id integer primary key,
  blob jsonb
);

INSERT INTO blobstable 
VALUES 
(1, '{
    "identifiers": [
      {"isin": "XS1", "primary": true},
      {"valor": 1234},
      {"isin": "EXTRA"}
    ],
    "workflows": {},
    "slots": {}
}'),
(2, '{
     "identifiers": [
      {"isin": "XS3", "primary": true},
      {"valor": 456, "linked": true},
      {"cusip": "TRGYN"}
    ],
    "workflows": {},
    "slots": {}
}'),
  (3, '{
    "identifiers": [
      {"isin": "XS12", "primary": true},
      {"valor": 678},
      {"isin": "XS5"}
    ],
    "workflows": {},
    "slots": {}
}')
;

Share Improve this question edited Feb 25 at 11:31 asked Feb 25 at 8:29 guruk 8310 bronze badges

As mentioned in your previous post, please show a minimal reproducible example, containing the CREATE TABLE and some insert statements. Please also show a single Markdown Table with your expected results. – Charlieface Commented Feb 25 at 10:29
I'm still very unclear: in those final "rules" are we purely looking at a match across the identifiers in a single JSON row, or do you want to look at all rows? Eg does row 1 affect row 3? Also your input search data seems to repeat itself, for example [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}] could be just [{"isin": "XS12"}] and everything can be worked out from there. – Charlieface Commented Feb 25 at 11:34
Hi, if multiple rows are returned for a single "isin" for example, actually we shoudl select the row with preference to "primary", then "linked" as in the "where"clause, so in that sense they are not fully independent, all candidates shoudl be checked for a particular ID – guruk Commented Feb 25 at 12:30
1 Your current query seems to need a small correction, because the last where filters out only if the id is found by another criteria. Thus, the only reason why {isin:"EXTRA"} of id 1 is not matched, is that because id 1 is matched by valor. But if you remove valor from the criterias, its {isin:"EXTRA"} will still match (while according to rule 3., it should not match because id 1 holds another isin with primary: true). Another way to test is by adding an {"isin": "EXTRA"}, to id 2: it incorrectly shows up as a result. – Guillaume Outters Commented Feb 25 at 15:23
1 I have put what I think would be the corrected version (according to what I have understood from your description) in a fiddle; along with a second version that simplifies the not exists by flattening all "priority classes" (primary, linked) to an easily comparable int field. – Guillaume Outters Commented Feb 25 at 17:34

| Show 1 more comment

2 Answers 2

Sorted by: Reset to default 0

Supposing your dataware are more often searched than modified,
I would compute a field dedicated to identifiers search (indexed, and computed ON INSERT OR UPDATE).

In 3 passes, it would contain:

all primary identifiers
and all linked identifiers of types not in 1.
and all identifiers of types not in 1. + 2.

Your search query would then be super simple.

I went with final query, as seen here: https://dbfiddle.uk/gwzGZX8j

       WITH scheme_and_id as (
                select     (id_json -> 'identifier') AS queried_id,
                           trim((id_json-> 'scheme')::text, '"')AS scheme,
                           ordinal
                from jsonb_array_elements('[
      {"identifier": "XS12", "scheme" : "isin"},
      {"identifier": 1234, "scheme" : "valor"},
      {"identifier": "EXTRA", "scheme" : "isin"}
    ]'::jsonb) WITH ORDINALITY as f(id_json, ordinal)),
                 -- first build the JSON to be used to match the index
                 resolve_id as (select -- select the JSON objects
                                       id,
                                       (blob -> 'identifiers')     as "ids",
                                       ordinal,
                                       queried_id,
                                       scheme
                                 from blobstable, scheme_and_id
                                where (blob -> 'identifiers')  @>  jsonb_build_array( jsonb_build_object(scheme, queried_id))),
                 candidates as (
                     SELECT
                         id,
                         ordinal,
                         scheme,
                         queried_id,
                         (identifier_row -> scheme) as candidate_id,
                         (identifier_row -> 'primary')::boolean as "primary",
                         (identifier_row -> 'linked')::boolean as "linked"
                     FROM resolve_id ,
                          LATERAL jsonb_array_elements("ids") identifier_row
                 )
            SELECT id
            from candidates c
                     right join scheme_and_id on c.ordinal = scheme_and_id.ordinal
                and ((c."primary" and c.queried_id = c.candidate_id)                                            -- first rule
                or  (c."linked"   and c.queried_id = c.candidate_id
                    and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "primary"))  -- second rule
                or (("primary" is Null or False) and ("linked" is Null or False) and c.queried_id = c.candidate_id
                    and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "linked")))  -- third rule
            order by scheme_and_id.ordinal;

So I added sorting, since I need to return NULL for not found identifiers. I also improved the way I provide the input into the query to drop the nested arrays and value duplication. I also decided to go with this where clause, because it moves all the conditional logic to the final step, and I believe is easier to read.

       from candidates c
                     right join scheme_and_id on c.ordinal = scheme_and_id.ordinal
                and ((c."primary" and c.queried_id = c.candidate_id)                                            -- first rule
                or  (c."linked"   and c.queried_id = c.candidate_id
                    and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "primary"))  -- second rule
                or (("primary" is Null or False) and ("linked" is Null or False) and c.queried_id = c.candidate_id
                    and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "linked")))  -- third rule
            order by scheme_and_id.ordinal;

Thank you for help

本文标签： postgresqlEffective JSON array search in batches with data extractionStack Overflow

版权声明：本文标题：postgresql - Effective JSON array search in batches, with data extraction - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741219179a2360635.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

postgresql - Effective JSON array search in batches, with data extraction - Stack Overflow

Data model

Lookup queries

Initial, single field search

New search rules

Desired query

Data model

Lookup queries

Initial, single field search

New search rules

Desired query

2 Answers 2

更多相关文章

postgresql - Effective JSON array search in batches, with data extraction - Stack Overflow

发表评论

推荐文章

javascript - How to reuse the Yup validation schema + sonar scan shown more duplication on yup schema validation files it increa

Define &quot;global&quot; variables in jenkins shared library that will be available to both srcvars folders and jenkins

javascript - Can there be generator getters in classes? - Stack Overflow

javascript - How to limit an array with 10,000 objects thats linked to my angular ui-select? - Stack Overflow

flutter - How to make supabase return an auth error if user tries to sign in with same email but different auth provider? - Stac

热门文章

Django app: how to read a bit of HTML in JavaScript - Stack Overflow

javascript - Passing a prototype&#39;s function as parameter without losing the &#39;this&#39; context - Stack Overf

force login and redirect to custom login page

Support Spring Boot for Graalvm 23 - Stack Overflow

Javascript to actionscript - Stack Overflow

javascript - Make function execute after user hovers over link for 2 seconds - Stack Overflow

dom - With javascript, how do you get final result URL after 302 redirect on img src? - Stack Overflow

Pagination on and tag.php not working

javascript - Uncaught Error: Assertion Failed: Attempted to register a view with an id already in use: name - Stack Overflow

javascript - How to unchecked all checked checkbox in angular 6 - Stack Overflow

最新文章

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

windows7虚拟机安装vmtools方法

javascript - Timer - Count up timer for Angular 8 - Stack Overflow

javascript - Pass Node env to layout.jade in Express without explicitly defining route - Stack Overflow

typescript - I have the correct export statement, but still get an error &quot;Uncaught SyntaxError&quot; - Stack Overfl

How can you specify the order of properties in a javascript object for a MongoDB index in node.js? - Stack Overflow

javascript - How do I make a button trigger a file input onChange in React? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

Define "global" variables in jenkins shared library that will be available to both srcvars folders and jenkins

javascript - Passing a prototype's function as parameter without losing the 'this' context - Stack Overf

typescript - I have the correct export statement, but still get an error "Uncaught SyntaxError" - Stack Overfl