admin管理员组文章数量:1410697
I have an RDF database which contains events. Events are associated with: a public key (hex string), a timestamp, and various metadata predicates.
User activity generates new events. I have a query that needs to retrieve only the latest metadata for all public keys. I have found the following solution, using a GROUP BY on the public key and then doing a GROUP_CONCAT on the metadata attributes:
SELECT ?pubk
(GROUP_CONCAT(?name; SEPARATOR=",") AS ?metadata_name)
(GROUP_CONCAT(?display_name; SEPARATOR=",") AS ?metadata_display_name)
WHERE {
?mevent a nostr:Event .
?mevent nostr:pubkey ?pubk .
?mevent nostr:content ?mcontent .
?mevent nostr:created_at ?created_at .
?mevent nostr:kind 0 .
?mcontent nostr:name ?name .
?mcontent nostr:display_name ?display_name .
}
GROUP BY ?pubk
ORDER BY DESC(?created_at)
This way i don't get results for each event, but rather one result for each public key, with the history of attributes concatenated in each variable. From the code i can then just split(",") each variable and use only the last element.
I think that there are a few problems with this approach:
I don't need all the old metadata, and since i cache some of the results in memory, this is not optimal
Separator: using a fixed separator string means that the metadata literal values can't contain the separator, that's a real problem.
The sparql implementation doesn't support HAVING or ROW_LIMIT in GROUP_CONCAT, so i can't optimize the query's output
So i'm wondering if there's a better approach to write a query that only retrieves the latest metadata for every pubkey, without using GROUP_CONCAT. (SAMPLE is also not an option).
Since events have a timestamp, i tried to use a subquery with MAX to find the latest timestamp for every pubkey, and use that in the main query:
SELECT ?pubk ?name ?display_name
WHERE {
?mevent a nostr:Event .
?mevent nostr:pubkey ?pubk .
?mevent nostr:content ?mcontent .
?mevent nostr:created_at ?latest_created_at .
?mevent nostr:kind 0 .
{
SELECT (MAX(?created_at) AS ?latest_created_at)
WHERE {
?event a nostr:Event .
?event nostr:pubkey ?pubk .
?event nostr:created_at ?created_at .
?event nostr:kind 0 .
}
}
OPTIONAL { ?mcontent nostr:name ?name . } .
OPTIONAL { ?mcontent nostr:display_name ?display_name . } .
}
It works partially: i get only 1 result (when there should be 5 since there are 5 different pubkeys in the db right now). The values are correct though, they are the latest metadata for this pubkey, but i don't know why i'm not getting rows for the other pubkeys.
EDIT: Found a working solution with "FILTER NOT EXISTS":
SELECT ?pubk ?name ?display_name
WHERE {
?mevent a nostr:Event .
?mevent nostr:pubkey ?pubk .
?mevent nostr:content ?mcontent .
?mevent nostr:created_at ?created_at .
?mevent nostr:kind 0 .
FILTER NOT EXISTS {
?event a nostr:Event .
?event nostr:kind 0 .
?event nostr:pubkey ?pubk .
?event nostr:created_at ?o_created_at .
FILTER (?o_created_at > ?created_at)
}
OPTIONAL { ?mcontent nostr:name ?name . } .
OPTIONAL { ?mcontent nostr:display_name ?display_name . } .
}
Dear SPARQL experts, is this correct ? Had never used "FILTER NOT EXISTS" before.
Thanks in advance.
Tried with GROUP_CONCAT, works but is inefficient. Tried with subquery, partially works. "FILTER NOT EXISTS" seems to work.
本文标签: sparqlGrouping by to get only the latest info without using GROUPCONCATStack Overflow
版权声明:本文标题:sparql - Grouping by to get only the latest info without using GROUP_CONCAT - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745066143a2640534.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论