admin管理员组文章数量:1410705
Neo4j: Issues with WHERE
Scope and Query Optimization in Fraud Detection Graph
I'm new to Neo4j and trying to build a fraud detection graph using a dataset found online.
Environment:
- Neo4j Desktop: 1.6.1
- Neo4j DB Version: 5.24.0
Issues
I am facing two main problems with my Cypher query:
Scope of
WHERE
Clauses:- The
WHERE
statements apply to the entire following part of the query instead of only the relevant section. - I tried using
CALL
statements but couldn't make them work correctly.
- The
Memory Issue & Query Performance:
- When I try to run the entire query (with fixes for the first issue), I run into memory problems.
- I need a way to optimize the query or split it into smaller subqueries.
Indexing
Before running the query, I created the following indexes:
CREATE INDEX person_index FOR (p:PERSONNE) ON (p.Numero_client);
CREATE INDEX phone_index FOR (t:TELEPHONE) ON (t.Numero);
CREATE INDEX car_index FOR (v:VOITURE) ON (v.Immatriculation);
CREATE INDEX email_index FOR (e:EMAIL) ON (e.Adresse_mail);
CREATE INDEX address_index FOR (a:ADRESSE) ON (a.Rue, a.Code_postal, a.Commune);
Full Query
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
// 1⃣ - Create PERSONNE nodes
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
// 2⃣ - Create TELEPHONE nodes and relationships
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
// 3⃣ - Create VOITURE nodes
WITH p, row
WHERE row.num_mnrl1 IS NOT NULL AND row.num_mnrl1 <> "" AND row.num_mnrl1 <> " "
MERGE (v:VOITURE {Immatriculation: row.num_mnrl1})
MERGE (p)-[:POSSEDE_VOITURE]->(v)
// 4⃣ - Create EMAIL nodes
WITH p, row
WHERE row.mail IS NOT NULL AND row.mail <> "" AND row.mail <> " "
MERGE (e:EMAIL {Adresse_mail: row.mail})
MERGE (p)-[:A_POUR_MAIL]->(e)
// 5⃣ - Create ADRESSE nodes
WITH p, row
WHERE row.rue IS NOT NULL AND row.cd_post IS NOT NULL AND rowmune IS NOT NULL
MERGE (a:ADRESSE {Rue: row.rue, Code_postal: row.cd_post, Commune: rowmune})
MERGE (p)-[:RESIDE_A]->(a);
Attempted Solution: Splitting Query
I tried breaking the query into smaller parts:
Step 1: Creating PERSONNE
nodes
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
Step 2: Adding TELEPHONE
nodes and relationships
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MATCH (p:PERSONNE {Numero_client: row.anc_ref})
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
This approach sometimes works, but most of the time, I get one of the following errors:
Error Message
ServiceUnavailable
WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browser's development console to determine the root cause of the failure.
Common reasons include:
- The database being unavailable
- Using the wrong connection URL
- Temporary network problems
- Browser not trusting the Neo4j certificate
WebSocket `readyState` is: 3
Or:
Neo.TransientError.General.OutOfMemoryError
There is not enough memory to perform the current task.
Please try increasing 'server.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface)
or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
Questions
- How can I ensure that
WHERE
statements only apply to the relevant section of my query? - What are the best ways to optimize this query to prevent memory issues?
- Is there a correct way to split the query without running into the indefinite execution issue with
MATCH
afterLOAD CSV
? - How do I troubleshoot the WebSocket connection failure in Neo4j Desktop?
Any help would be greatly appreciated!
Neo4j: Issues with WHERE
Scope and Query Optimization in Fraud Detection Graph
I'm new to Neo4j and trying to build a fraud detection graph using a dataset found online.
Environment:
- Neo4j Desktop: 1.6.1
- Neo4j DB Version: 5.24.0
Issues
I am facing two main problems with my Cypher query:
Scope of
WHERE
Clauses:- The
WHERE
statements apply to the entire following part of the query instead of only the relevant section. - I tried using
CALL
statements but couldn't make them work correctly.
- The
Memory Issue & Query Performance:
- When I try to run the entire query (with fixes for the first issue), I run into memory problems.
- I need a way to optimize the query or split it into smaller subqueries.
Indexing
Before running the query, I created the following indexes:
CREATE INDEX person_index FOR (p:PERSONNE) ON (p.Numero_client);
CREATE INDEX phone_index FOR (t:TELEPHONE) ON (t.Numero);
CREATE INDEX car_index FOR (v:VOITURE) ON (v.Immatriculation);
CREATE INDEX email_index FOR (e:EMAIL) ON (e.Adresse_mail);
CREATE INDEX address_index FOR (a:ADRESSE) ON (a.Rue, a.Code_postal, a.Commune);
Full Query
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
// 1⃣ - Create PERSONNE nodes
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
// 2⃣ - Create TELEPHONE nodes and relationships
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
// 3⃣ - Create VOITURE nodes
WITH p, row
WHERE row.num_mnrl1 IS NOT NULL AND row.num_mnrl1 <> "" AND row.num_mnrl1 <> " "
MERGE (v:VOITURE {Immatriculation: row.num_mnrl1})
MERGE (p)-[:POSSEDE_VOITURE]->(v)
// 4⃣ - Create EMAIL nodes
WITH p, row
WHERE row.mail IS NOT NULL AND row.mail <> "" AND row.mail <> " "
MERGE (e:EMAIL {Adresse_mail: row.mail})
MERGE (p)-[:A_POUR_MAIL]->(e)
// 5⃣ - Create ADRESSE nodes
WITH p, row
WHERE row.rue IS NOT NULL AND row.cd_post IS NOT NULL AND rowmune IS NOT NULL
MERGE (a:ADRESSE {Rue: row.rue, Code_postal: row.cd_post, Commune: rowmune})
MERGE (p)-[:RESIDE_A]->(a);
Attempted Solution: Splitting Query
I tried breaking the query into smaller parts:
Step 1: Creating PERSONNE
nodes
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
WITH p, row
WHERE row.cres_soc <> "00" AND row.dres_soc IS NOT NULL
SET p.dres_soc = row.dres_soc
Step 2: Adding TELEPHONE
nodes and relationships
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
MATCH (p:PERSONNE {Numero_client: row.anc_ref})
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
This approach sometimes works, but most of the time, I get one of the following errors:
Error Message
ServiceUnavailable
WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browser's development console to determine the root cause of the failure.
Common reasons include:
- The database being unavailable
- Using the wrong connection URL
- Temporary network problems
- Browser not trusting the Neo4j certificate
WebSocket `readyState` is: 3
Or:
Neo.TransientError.General.OutOfMemoryError
There is not enough memory to perform the current task.
Please try increasing 'server.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface)
or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
Questions
- How can I ensure that
WHERE
statements only apply to the relevant section of my query? - What are the best ways to optimize this query to prevent memory issues?
- Is there a correct way to split the query without running into the indefinite execution issue with
MATCH
afterLOAD CSV
? - How do I troubleshoot the WebSocket connection failure in Neo4j Desktop?
Any help would be greatly appreciated!
Share Improve this question edited Mar 4 at 13:17 goubint asked Mar 4 at 13:07 goubintgoubint 133 bronze badges 01 Answer
Reset to default 0This answer attempts to address questions 1, 2, and 3.
You can use CALL subqueries to perform updates to the DB, and that can be done in ways that do not affect the number of rows in the outer query. For example, most of the subqueries in the example below are Unit subqueries that do not RETURN
anything), which never affect the number of rows in the outer query. The first subquery does RETURN something (p
), but it returns a value for every outer row (because it does not filter out any rows), so it also does not affect the number of outer rows.
Also, you can have those subqueries execute multiple transactions, to process subquery rows in batches (of 1000, by default), to avoid running out of memory.
Potentially, something like this may work for you:
:auto // Only needed if running in Neo4j Browser
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
// 1 - Create PERSONNE nodes
CALL (row) {
MERGE (p:PERSONNE {Numero_client: row.anc_ref})
SET
p.Chainage = row.chainage,
p.Sexe = row.sexe,
p.cres_soc = row.cres_soc,
p.dres_soc = CASE WHEN row.cres_soc <> "00" THEN row.dres_soc END,
p.Age = row.age_cli,
p.Nom = row.nom_cli,
p.Prenom = row.pnom_cli,
p.Nom_conjoint = row.nom_cjt,
p.Prenom_conjoint = row.pnom_cjt
RETURN p
} IN TRANSACTIONS
// 2 - Create TELEPHONE nodes and relationships
CALL (p, row) {
WITH p, row
UNWIND [row.num_tel1, row.num_tel2] AS tel
WITH p, row, tel
WHERE tel IS NOT NULL AND tel <> "" AND tel <> " "
MERGE (t:TELEPHONE {Numero: tel})
MERGE (p)-[:A_POUR_TELEPHONE {type: CASE WHEN tel = row.num_tel1 THEN "PRINCIPAL" ELSE "SECONDAIRE" END}]->(t)
} IN TRANSACTIONS
// 3 - Create VOITURE nodes
CALL (p, row) {
WITH p, row
WHERE row.num_mnrl1 IS NOT NULL AND row.num_mnrl1 <> "" AND row.num_mnrl1 <> " "
MERGE (v:VOITURE {Immatriculation: row.num_mnrl1})
MERGE (p)-[:POSSEDE_VOITURE]->(v)
} IN TRANSACTIONS
// 4 - Create EMAIL nodes
CALL (p, row) {
WITH p, row
WHERE row.mail IS NOT NULL AND row.mail <> "" AND row.mail <> " "
MERGE (e:EMAIL {Adresse_mail: row.mail})
MERGE (p)-[:A_POUR_MAIL]->(e)
} IN TRANSACTIONS
// 5 - Create ADRESSE nodes
CALL (p, row) {
WITH p, row
WHERE row.rue IS NOT NULL AND row.cd_post IS NOT NULL AND rowmune IS NOT NULL
MERGE (a:ADRESSE {Rue: row.rue, Code_postal: row.cd_post, Commune: rowmune})
MERGE (p)-[:RESIDE_A]->(a)
} IN TRANSACTIONS
Notes:
- The
:auto
command should only be used if running in the Neo4j Browser (it is needed for theIN TRANSACTIONS
functionality). - You may want to specify larger batch sizes for some or all CALL subqueries (see the docs for the syntax).
- This query assumes you are using at least neo4j version 5.23. If you are not, you need to replace the variable-scope-clause syntax with prepended
WITH
clauses in each subquery. - This query can still run out of memory if the data file has a large number of rows (since their data, plus data for the new
p
nodes, have to held in memory), in which case you many want to split the file into smaller ones to be processed separately.
本文标签: Neo4j graph creation queryStack Overflow
版权声明:本文标题:Neo4j graph creation query - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745041579a2639107.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论