sparql - Grouping by to get only the latest info without using GROUP_CONCAT - Stack Overflow-软件玩家

admin管理员组
文章数量:1410697

I have an RDF database which contains events. Events are associated with: a public key (hex string), a timestamp, and various metadata predicates.

User activity generates new events. I have a query that needs to retrieve only the latest metadata for all public keys. I have found the following solution, using a GROUP BY on the public key and then doing a GROUP_CONCAT on the metadata attributes:

SELECT ?pubk
       (GROUP_CONCAT(?name; SEPARATOR=",") AS ?metadata_name)
       (GROUP_CONCAT(?display_name; SEPARATOR=",") AS ?metadata_display_name)
WHERE {
    ?mevent a nostr:Event .
    ?mevent nostr:pubkey ?pubk .
    ?mevent nostr:content ?mcontent .
    ?mevent nostr:created_at ?created_at .
    ?mevent nostr:kind 0 .

    ?mcontent nostr:name ?name .
    ?mcontent nostr:display_name ?display_name .
}
GROUP BY ?pubk
ORDER BY DESC(?created_at)

This way i don't get results for each event, but rather one result for each public key, with the history of attributes concatenated in each variable. From the code i can then just split(",") each variable and use only the last element.

I think that there are a few problems with this approach:

I don't need all the old metadata, and since i cache some of the results in memory, this is not optimal
Separator: using a fixed separator string means that the metadata literal values can't contain the separator, that's a real problem.
The sparql implementation doesn't support HAVING or ROW_LIMIT in GROUP_CONCAT, so i can't optimize the query's output

So i'm wondering if there's a better approach to write a query that only retrieves the latest metadata for every pubkey, without using GROUP_CONCAT. (SAMPLE is also not an option).

Since events have a timestamp, i tried to use a subquery with MAX to find the latest timestamp for every pubkey, and use that in the main query:

SELECT ?pubk ?name ?display_name
WHERE {
    ?mevent a nostr:Event .
    ?mevent nostr:pubkey ?pubk .
    ?mevent nostr:content ?mcontent .
    ?mevent nostr:created_at ?latest_created_at .
    ?mevent nostr:kind 0 .

    {
        SELECT (MAX(?created_at) AS ?latest_created_at)
        WHERE {
            ?event a nostr:Event .
            ?event nostr:pubkey ?pubk .
            ?event nostr:created_at ?created_at .
            ?event nostr:kind 0 .
        }
    }

    OPTIONAL { ?mcontent nostr:name ?name . } .
    OPTIONAL { ?mcontent nostr:display_name ?display_name . } .
}

It works partially: i get only 1 result (when there should be 5 since there are 5 different pubkeys in the db right now). The values are correct though, they are the latest metadata for this pubkey, but i don't know why i'm not getting rows for the other pubkeys.

EDIT: Found a working solution with "FILTER NOT EXISTS":

SELECT ?pubk ?name ?display_name
WHERE {
    ?mevent a nostr:Event .
    ?mevent nostr:pubkey ?pubk .
    ?mevent nostr:content ?mcontent .
    ?mevent nostr:created_at ?created_at .
    ?mevent nostr:kind 0 .

    FILTER NOT EXISTS {
      ?event a nostr:Event .
      ?event nostr:kind 0 .
      ?event nostr:pubkey ?pubk .
      ?event nostr:created_at ?o_created_at .

      FILTER (?o_created_at > ?created_at)
    }

    OPTIONAL { ?mcontent nostr:name ?name . } .
    OPTIONAL { ?mcontent nostr:display_name ?display_name . } .
}

Dear SPARQL experts, is this correct ? Had never used "FILTER NOT EXISTS" before.

Thanks in advance.

Tried with GROUP_CONCAT, works but is inefficient. Tried with subquery, partially works. "FILTER NOT EXISTS" seems to work.

本文标签： sparqlGrouping by to get only the latest info without using GROUPCONCATStack Overflow

版权声明：本文标题：sparql - Grouping by to get only the latest info without using GROUP_CONCAT - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1745066143a2640534.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

sparql - Grouping by to get only the latest info without using GROUP_CONCAT - Stack Overflow

更多相关文章