admin管理员组文章数量:1379438
I have constructed a Neo4j graph based on MovieLens100k with the following content:
Nodes:
(:User)
,(:Movie)
,(:Genre)
Relations:
(:User)-[:RATED]->(:Movie)
,(:Movie)-[:HAS_GENRE]->(:Genre)
I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User)
and that the meta-path to follow is the following:
path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)
Suppose to have computed such quantities:
N_ij = Number of path of type path1 starting from (:User)
with id=i and ending in (:User)
with id=j
Then PathSim for user with id 1 and id 2 is:
PathSim = 2N_12/(N_11+N_22)
To compute such quantity in Neo4j my Cypher query (1) is:
"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie)
starting from (:User)
with id=1, i.e. query (2):
"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
By taking into account "trivial" paths consisting of going from (:User)
to (:Movie)
and coming back from the same exact half-path.
However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.
Is there a way to account for "trivial" paths by sort of modifying query (1)?
I have constructed a Neo4j graph based on MovieLens100k with the following content:
Nodes:
(:User)
,(:Movie)
,(:Genre)
Relations:
(:User)-[:RATED]->(:Movie)
,(:Movie)-[:HAS_GENRE]->(:Genre)
I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User)
and that the meta-path to follow is the following:
path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)
Suppose to have computed such quantities:
N_ij = Number of path of type path1 starting from (:User)
with id=i and ending in (:User)
with id=j
Then PathSim for user with id 1 and id 2 is:
PathSim = 2N_12/(N_11+N_22)
To compute such quantity in Neo4j my Cypher query (1) is:
"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie)
starting from (:User)
with id=1, i.e. query (2):
"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
By taking into account "trivial" paths consisting of going from (:User)
to (:Movie)
and coming back from the same exact half-path.
However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.
Is there a way to account for "trivial" paths by sort of modifying query (1)?
Share Improve this question edited Mar 19 at 11:45 Emanuele Maduli asked Mar 19 at 11:28 Emanuele MaduliEmanuele Maduli 11 bronze badge1 Answer
Reset to default 1Neo4j will filter out any MATCH result row that repeats the same relationship. That is why (1) did not do what you expected.
Assuming that:
N_11
is the number of(:User {id: "1"})-[:RATED]->(:Movie)
paths, andUser
nodeid
properties have string values (to allow my query to use them as map keys),
this query may do what you want:
MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(*) as ct
WITH
COLLECT(DISTINCT u1) as u1s,
COLLECT(DISTINCT u2) AS u2s,
COLLECT({headUser: u1.id, tailUser: u2.id, ct: ct}) AS pairs
UNWIND apoc.coll.intersection(u1s, u2s) AS user
WITH user, COUNT { (user)-[:RATED]->(:Movie) } AS userCt, pairs
WITH COLLECT(user.id) AS userIds, COLLECT(userCt) AS userCts, pairs
WITH apoc.map.fromLists(userIds, userCts) AS userCtMap, pairs
UNWIND pairs as pair
RETURN
pair.headUser AS headUser,
pair.tailUser AS tailUser,
(2.0 * pair.ct)/(userCtMap[pair.headUser] + userCtMap[pair.tailUser]) AS pathSim
This query calculates all distinct users found by the MATCH
clause and maps each user's id
to its number of outgoing -[:RATED]->(:Movie)
relationships. It then uses that mapping to calculate the denominator of your pathSim formula.
本文标签: cypherNeo4jCount of symmetric metapathInclude half metapathsStack Overflow
版权声明:本文标题:cypher - Neo4j - Count of symmetric meta-path - Include half meta-paths - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744461054a2607238.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论