Geometric Deep Learning
  • 1. Introduction
  • Contributing
  • Graph Convolutional Neural Networks
  • Graphs I
  • Graphs II
  • High-Dimensional Learning
Powered by GitBook
On this page
  • Introduction
  • 1. Idea
  • 2. Definition
  • 3. History of Graph Neural Networks
  • 4. Key Structural Properties of Graphs
  • 4.2 Invariant Graph Functions: Permutation-invariant
  • 4.3 Equivariant Graph Functions
  • 5. Construct Graphs Functions
  • 5.1 Multiset
  • 5.2 Local Aggregation and Message Passing
  • References

Graphs I

PreviousGraph Convolutional Neural NetworksNextGraphs II

Last updated 3 years ago

Introduction

Graphs are probably the most versatile and the most important construction when we refer to representation. They are astonishly ubiquitous. We can describe practically any system of relations or interactions as a graph. This is true at nano scales where we can model individual molecules, to micro scale where we can look at different interactions between biological entities like molecules, drugs and proteins modelled in interactomes, to finally concluding at macro scale by modelling social graphs of people.

1. Idea

Conceptualization: When we think of a graph, we think about a system of relations and interactions between its elements.

A graph is a collection of and , each links a pair of , defining a relationship of incidence between vertices and edges. There are several variations on the idea.

2. Definition

Let VVV and EEE be sets. Call an element of VVV a vertex and a element of EEE an edge. A graph is given by V,EV, EV,E, and a mapping ddd that interprets edges as pairs of vertices. Exactly what this means depends on how one defines "mapping that interprets" and "pair". The possibilities are given below. We will need the following notation:

  • V2V^{2}V2 is the of VVV with itself, the sed of ordered pairs (x,y)(x,y)(x,y) of vertices.

  • ΔV\Delta_{V}ΔV​ is the of VVV, the set of pairs (x,x)(x,x)(x,x), so that its complement V2∖ΔVV^2 \setminus \Delta_VV2∖ΔV​ is the set of pairs as above where x≠yx \neq yx=y.

  • ⟨V2⟩\left\langle{V \atop 2}\right\rangle⟨2V​⟩ is the of V2V^{2}V2 in which (x,y)(x,y)(x,y) is identified with (y,x)(y,x)(y,x), the set of unordered pairs x,y{x,y}x,y of vertices.

This brings us with the following types of graphs according to their ordering:

  1. Undirected graphs

  2. Directed graphs

  3. Undirected graphs as directed graphs with an involution

We can also attach some features to the nodes that will be modeled as d-dimensional vectors. For example, in a social network graph, each node might contain information relative to the user of the node such as age, height, sex... For the sake of simplicity we are going to ignore possible features linked to edges, but they could exist.

3. History of Graph Neural Networks

Machine Learners

Labeling RAAM (1994)

Backpropagation through structure (1996)

Graph Neural Networks (2008)

Gated GNN (2015)

Graph Theorists

Weisfeiler-Lehman kernels (2009)

k-GNN

GIN (2019)

Provably powerful GNN

Chemists

ChemNet (1995)

Neural descriptors (1997)

Molecular graph net (2005)

Molecular fingerprints (2017)

The term graph neural networks first appeared in a series of papers by the group of research conformed by M. Gore and F. Scarselli. But graph theory research argue that graph neural networks are just a rebranding of a graph isomorphism test called the Weisfeiler-Lehman test (1968). This test address a classical graph theory problem that tries to determine if 2 graphs are isomorphic or have the same connectivity up to reordering the nodes.

NOTE: Nowadays there isn't any algorithm that can compute and solve the Weisfeiler-Lehman Test in polynomial time. Only solvable in polynomial time up to graphs with 9 nodes. Indeed this problem was proved to have a computational solution in quasi-polynomial

4. Key Structural Properties of Graphs

The key structure of a graph or a set is that it's unordered. We don't have a canonical way to order its nodes. Thus when we number the nodes of an graph, for example the above one, we are partially cheating. It's convenient because we can now organize the nodes features into a matrix of dimensions n×dn \times dn×d, where n is the number of nodes and d the dimension of the features. This way we can automatically prescribe some arbitrary ordering and apply it for the adjacency matrix.

The adjacency matrix for a graph with nnn vertices is an n×nn \times nn×n matrix whose (i,j)(i,j)(i,j) entry is 111 if the vertex ithi^{th}ith and vertex jthj^{th}jth are connected, and 000 if they are not.

The representation of the adjacency matrix depends on the selected ordering to the set of nodes.

Conclusion: If we number the nodes differently implies that the rows of the feature matrix and corresponding rows and columns of the adjacency matrix will be permuted accordingly. We denote this permutation matrix by PPP, and PPP is an elemen of the permutation group of nnn elements. The permuation group contains n!n!n! elements. We can think of PPP as a representation of the permutation group.

NOTE: The representation differs with respect to the type of the object. The permutation group acts differently on vectors and matrices.

4.2 Invariant Graph Functions: Permutation-invariant

Strong precondition needed: In order to implement a function acting on a graph that produces a single output for the entire graph, it's mandatory to assert that this single value generated output is unaffected by the chosen ordering of the input nodes. Independently of the ordering of the nodes, the computed output for the same graph must be always the same, although the nodes were or not rearranged.

4.3 Equivariant Graph Functions

In this scenario it's convenient that the output of the function changes harmoniously to the fluctuations in the input with the reordering of the nodes.

F(PX,PAP⊤)=PF(X,A)F(PX,PAP^{\top})= PF(X,A)F(PX,PAP⊤)=PF(X,A)

5. Construct Graphs Functions

5.1 Multiset

Idea

Definition

It is common to define them in terms of sets and functions.

5.2 Local Aggregation and Message Passing

This is the main operation of a classical Graph Neural Network architecture.

Question: How this operation works? Answer: For each node in the graph we look at the neighbors and take the feature vectors that together conforms the multiset. Even though the indices of the neighbors are unique, the feature vectors are not necessarily unique.

Now we try to aggregate these features together with the feature vector of the node itself. It must be done in a permutation invariant way because we don't have a canonical ordering of the neighbors on a graph.

Φ(Xi,XNi)\Phi \Big (X_{i}, X_{\mathcal{N_i}} \Big)Φ(Xi​,XNi​​)

If we apply this Φ\PhiΦ local function at every node of the graph and stack the results into a feature matrix, denoted by FFF, we get a permutation equivariant function. Thus if we reorder the nodes then the order of the rows of this matrix will be changed.

This local aggregation function on GNN typically looks as follows:

  • permutation invariant aggregation operator, e.g. sum\textcolor{Cyan}{permutation \: invariant \: aggregation \: operator, \: e.g. \: sum}permutationinvariantaggregationoperator,e.g.sum: It is often sum or maximal.

  • learnable functions Ψ\color{Orange}{learnable \:functions} \: \PsilearnablefunctionsΨ:

    1. ψ\psiψ: This function transforms the neighbors features. It's a non linear function that depends on both feature vectors of node iii and jjj. Its output can be seen as a message that is sent from node jjj to update node iii. That's why this architecture is also called message passing graph neural networks.

    2. ϕ\phiϕ: Updates the features of node iii using the aggregated features of the neighbors.

  • new feature of node i\textcolor{DarkOrchid}{new \: feature \: of \: node \: i}newfeatureofnodei: f:xi→xi′f : x_i \to x'_{i}f:xi​→xi′​

References

A multiset is like a , just allowing that the elements have multiplicities. Thus the multiset {1,1,2}\{ 1,1,2\}{1,1,2} differs from the multiset {1,2}\{1,2\}{1,2}, while {1,1,2}\{1,1,2\}{1,1,2} is the same as {1,1,2}\{ 1,1,2\}{1,1,2}. (See ref [4][4][4])

A multiset X=⟨X,μX⟩multiset \: \mathcal{X} = \langle X,\mu_X\ranglemultisetX=⟨X,μX​⟩, can be defined as a set XXX together with a function μX\mu_{X}μX​ (giving each element its multiplicity) from XXX to a class of nonzero .

NOTE: In this example, we have the blue feature repeated twice. Two nodes have the same feature vector. In this scenario we dealing with a multiset. (See section )

vertices
edges
edge
vertices
cartesian product
diagonal subset
quotient set
set
cardinal numbers
nLab: graph
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Michael Bronstein's Youtube channel
multiset
5.1 Multiset
Example of a graph
Adjacency Matrix
Invariant Graph Functions: Permutation-invariant
neighbourhood example for a given node i
multiset of neighbour features
aggregation function over a node i
Gilmer et al. 2017 (MPNN); Battaglia et al 2018; Wang et al. 2018