admin管理员组

文章数量:1391934

What is the format of node features that graph2vec and GL2vec in karateclub require? It does mention that there should be no string features but with or without it I am running into an error with the following code:

import networkx as nx
import karateclub as kc
import matplotlib.pyplot as plt

G = nx.DiGraph()

G.add_node(0, label = "A", feature=[0.5])
G.add_node(1, label = "B", feature=[1.2])
G.add_node(2, label = "C", feature=[0.8])

G.add_edge(0, 1)
G.add_edge(0, 2)
G.add_edge(1, 2)

nx.draw_networkx(G, with_labels=True)
plt.show()

graphs = [G]

model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings)

Error: RuntimeError: you must first build vocabulary before training the model

I saw an option to build_vocab in word2vec, but how do I do it for graph2vec?

Are there any alternate packages, preferably with simpler implementations like karateclub, that I can use to generate embeddings for a list of directed node/edge attributed networkx graphs without the need to define training/test sets?

What is the format of node features that graph2vec and GL2vec in karateclub require? It does mention that there should be no string features but with or without it I am running into an error with the following code:

import networkx as nx
import karateclub as kc
import matplotlib.pyplot as plt

G = nx.DiGraph()

G.add_node(0, label = "A", feature=[0.5])
G.add_node(1, label = "B", feature=[1.2])
G.add_node(2, label = "C", feature=[0.8])

G.add_edge(0, 1)
G.add_edge(0, 2)
G.add_edge(1, 2)

nx.draw_networkx(G, with_labels=True)
plt.show()

graphs = [G]

model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings)

Error: RuntimeError: you must first build vocabulary before training the model

I saw an option to build_vocab in word2vec, but how do I do it for graph2vec?

Are there any alternate packages, preferably with simpler implementations like karateclub, that I can use to generate embeddings for a list of directed node/edge attributed networkx graphs without the need to define training/test sets?

Share Improve this question edited Mar 25 at 20:47 Sandipan Dey 23.3k4 gold badges57 silver badges71 bronze badges asked Mar 14 at 8:52 Arindam GhoshArindam Ghosh 1338 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

As explained in their paper on graph2vec:

With the background on word and document embeddings presented in the previous section, an important intuition we extend in graph2vec is to view an entire graph as a document and the rooted subgraphs (that encompass a neighborhood of certain degree) around every node in the graph as words that compose the document. In other words, different subgraphs compose graphs in a similar way that different words compose sentences/documents when used together.

Similar to the document convention, the only required input is a corpus of graphs for graph2vec to learn their representations. Given a dataset of graphs, graph2vec considers the set of all rooted subgraphs (i.e., neighbourhoods) around every node (up to a certain degree) as its vocabulary. Subsequently, following the doc2vec skipgram training process, we learn the representations of each graph in the dataset.

in order to train the embeddings a graph corpus is needed, from which rooted subgraphs are extracted and the embeddings are learnt, using the Weisfeiler-Lehman (WL) hashing algorithm) so that graphs with similar structures will have similar feature representations in the embedding space.

From the examples here, they have used the following graph corpus:

graphs = [nx.newman_watts_strogatz_graph(50, 5, 0.3) for _ in range(1000)] # 1000 graphs in the corpus
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings.shape) 
# (1000, 128) # for each graph learnt a 128-dim embedding vector
# now infer
model.infer([G])
# array([[ 0.00371773,  0.00212566, -0.00158476,  0.00075081,  0.00121916,
#        -0.00078127, -0.00187484,  0.00024809,  0.00116782, -0.00179677,
#        ...     ...    ...

Once the training is over, we can ask the model to compute the embedding for a new graph G, using the method infer() , as shown above.

Providing repeated samples of the same graph G as training corpus also works, e.g., the following code will also learn embeddings for the graphs, but I think the quality of the embedding will be poor and it will capture the similarity in the rooted-subgraph structure in the embedding vector space more, as we provide richer and more diverse graph corpus as training dataset.

graphs = [G for _ in range(100)] # 100 graphs in the corpus
model = kc.graph_embedding.Graph2Vec()
model.fit(graphs)
embeddings = model.get_embedding()
print(embeddings.shape) 
# (100, 128)
model.infer([G])
# array([[ 0.0037295 ,  0.00211123, -0.00169718,  0.00071603,  0.00125103,
#        -0.00072378, -0.00181226,  0.00028596,  0.00113032, -0.00179663,
#        ...         ...           ...  

本文标签: pythonNodeedge attributed directed graph embeddingStack Overflow