admin管理员组

文章数量:1379438

I have constructed a Neo4j graph based on MovieLens100k with the following content:

  1. Nodes: (:User), (:Movie), (:Genre)

  2. Relations: (:User)-[:RATED]->(:Movie), (:Movie)-[:HAS_GENRE]->(:Genre)

I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User) and that the meta-path to follow is the following:

path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)

Suppose to have computed such quantities:

N_ij = Number of path of type path1 starting from (:User) with id=i and ending in (:User) with id=j

Then PathSim for user with id 1 and id 2 is:

PathSim = 2N_12/(N_11+N_22)

To compute such quantity in Neo4j my Cypher query (1) is:

"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie) starting from (:User) with id=1, i.e. query (2):

"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

By taking into account "trivial" paths consisting of going from (:User) to (:Movie) and coming back from the same exact half-path.

However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.

Is there a way to account for "trivial" paths by sort of modifying query (1)?

I have constructed a Neo4j graph based on MovieLens100k with the following content:

  1. Nodes: (:User), (:Movie), (:Genre)

  2. Relations: (:User)-[:RATED]->(:Movie), (:Movie)-[:HAS_GENRE]->(:Genre)

I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User) and that the meta-path to follow is the following:

path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)

Suppose to have computed such quantities:

N_ij = Number of path of type path1 starting from (:User) with id=i and ending in (:User) with id=j

Then PathSim for user with id 1 and id 2 is:

PathSim = 2N_12/(N_11+N_22)

To compute such quantity in Neo4j my Cypher query (1) is:

"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie) starting from (:User) with id=1, i.e. query (2):

"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

By taking into account "trivial" paths consisting of going from (:User) to (:Movie) and coming back from the same exact half-path.

However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.

Is there a way to account for "trivial" paths by sort of modifying query (1)?

Share Improve this question edited Mar 19 at 11:45 Emanuele Maduli asked Mar 19 at 11:28 Emanuele MaduliEmanuele Maduli 11 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 1

Neo4j will filter out any MATCH result row that repeats the same relationship. That is why (1) did not do what you expected.

Assuming that:

  • N_11 is the number of (:User {id: "1"})-[:RATED]->(:Movie) paths, and
  • User node id properties have string values (to allow my query to use them as map keys),

this query may do what you want:

MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(*) as ct
WITH
  COLLECT(DISTINCT u1) as u1s,
  COLLECT(DISTINCT u2) AS u2s,
  COLLECT({headUser: u1.id, tailUser: u2.id, ct: ct}) AS pairs
UNWIND apoc.coll.intersection(u1s, u2s) AS user
WITH user, COUNT { (user)-[:RATED]->(:Movie) } AS userCt, pairs
WITH COLLECT(user.id) AS userIds, COLLECT(userCt) AS userCts, pairs
WITH apoc.map.fromLists(userIds, userCts) AS userCtMap, pairs
UNWIND pairs as pair
RETURN
  pair.headUser AS headUser,
  pair.tailUser AS tailUser,
  (2.0 * pair.ct)/(userCtMap[pair.headUser] + userCtMap[pair.tailUser]) AS pathSim

This query calculates all distinct users found by the MATCH clause and maps each user's id to its number of outgoing -[:RATED]->(:Movie) relationships. It then uses that mapping to calculate the denominator of your pathSim formula.

本文标签: cypherNeo4jCount of symmetric metapathInclude half metapathsStack Overflow