admin管理员组文章数量:1391934
What is the format of node features that graph2vec
and GL2vec
in karateclub
require? It does mention that there should be no string features but with or without it I am running into an error with the following code:
import networkx as nx
import karateclub as kc
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_node(0, label = "A", feature=[0.5])
G.add_node(1, label = "B", feature=[1.2])
G.add_node(2, label = "C", feature=[0.8])
G.add_edge(0, 1)
G.add_edge(0, 2)
G.add_edge(1, 2)
nx.draw_networkx(G, with_labels=True)
plt.show()
graphs = [G]
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings)
Error: RuntimeError: you must first build vocabulary before training the model
I saw an option to build_vocab in word2vec
, but how do I do it for graph2vec
?
Are there any alternate packages, preferably with simpler implementations like karateclub
, that I can use to generate embeddings for a list of directed node/edge attributed networkx
graphs without the need to define training/test sets?
What is the format of node features that graph2vec
and GL2vec
in karateclub
require? It does mention that there should be no string features but with or without it I am running into an error with the following code:
import networkx as nx
import karateclub as kc
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_node(0, label = "A", feature=[0.5])
G.add_node(1, label = "B", feature=[1.2])
G.add_node(2, label = "C", feature=[0.8])
G.add_edge(0, 1)
G.add_edge(0, 2)
G.add_edge(1, 2)
nx.draw_networkx(G, with_labels=True)
plt.show()
graphs = [G]
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings)
Error: RuntimeError: you must first build vocabulary before training the model
I saw an option to build_vocab in word2vec
, but how do I do it for graph2vec
?
Are there any alternate packages, preferably with simpler implementations like karateclub
, that I can use to generate embeddings for a list of directed node/edge attributed networkx
graphs without the need to define training/test sets?
1 Answer
Reset to default 1As explained in their paper on graph2vec:
With the background on word and document embeddings presented in the previous section, an important intuition we extend in graph2vec is to view an entire graph as a document and the rooted subgraphs (that encompass a neighborhood of certain degree) around every node in the graph as words that compose the document. In other words, different subgraphs compose graphs in a similar way that different words compose sentences/documents when used together.
Similar to the document convention, the only required input is a corpus of graphs for graph2vec to learn their representations. Given a dataset of graphs, graph2vec considers the set of all rooted subgraphs (i.e., neighbourhoods) around every node (up to a certain degree) as its vocabulary. Subsequently, following the doc2vec skipgram training process, we learn the representations of each graph in the dataset.
in order to train the embeddings a graph corpus is needed, from which rooted subgraphs are extracted and the embeddings are learnt, using the Weisfeiler-Lehman (WL) hashing algorithm) so that graphs with similar structures will have similar feature representations in the embedding space.
From the examples here, they have used the following graph corpus:
graphs = [nx.newman_watts_strogatz_graph(50, 5, 0.3) for _ in range(1000)] # 1000 graphs in the corpus
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings.shape)
# (1000, 128) # for each graph learnt a 128-dim embedding vector
# now infer
model.infer([G])
# array([[ 0.00371773, 0.00212566, -0.00158476, 0.00075081, 0.00121916,
# -0.00078127, -0.00187484, 0.00024809, 0.00116782, -0.00179677,
# ... ... ...
Once the training is over, we can ask the model to compute the embedding for a new graph G
, using the method infer()
, as shown above.
Providing repeated samples of the same graph G
as training corpus also works, e.g., the following code will also learn embeddings for the graphs, but I think the quality of the embedding will be poor and it will capture the similarity in the rooted-subgraph structure in the embedding vector space more, as we provide richer and more diverse graph corpus as training dataset.
graphs = [G for _ in range(100)] # 100 graphs in the corpus
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings.shape)
# (100, 128)
model.infer([G])
# array([[ 0.0037295 , 0.00211123, -0.00169718, 0.00071603, 0.00125103,
# -0.00072378, -0.00181226, 0.00028596, 0.00113032, -0.00179663,
# ... ... ...
本文标签: pythonNodeedge attributed directed graph embeddingStack Overflow
版权声明:本文标题:python - Nodeedge attributed directed graph embedding - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744666750a2618578.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论