GNNs and Transformers in AI

Samvar Shah
Feb 20
1 min read

GNNs & Transformers: 50th Post on this Blog!

In AI, Graph Neural Networks (GNNs) and Transformers are both widely used but they are designed for different types of data and problems.

GNNs are designed to work with graph-structured data. For example, recommendation systems (users connected to products). GNNs work by passing messages between connected nodes. Each node updates its representation (vector) based on information from its neighbors.

Transformers are neural networks designed for sequence data, especially text.

They rely on a mechanism called self-attention, which allows every token (word) in a sequence to interact with every other token. Instead of local neighbor passing (like GNNs), Transformers allow global connections.

However, mathematically they have similar guiding principles:

Both models represent:

Nodes (GNNs)
Tokens (Transformers)

as vectors in high-dimensional space

Both rely heavily on matrix multiplication.

Both models are trained using:

Loss functions
Gradients
Backpropagation

They rely on multivariable calculus to compute gradients.

Both use probability:

Softmax to create probability distributions
Modeling uncertainty in predictions

So they use the mathematical script for their working. Interestingly, attention in Transformers can be viewed as a fully connected graph where every token connects to every other token. This means a Transformer can be seen as a special type of graph neural network with dynamic edge weights.

Did you think of GNNs and Transformers as different and yet so similar? What do you think of Transformer as a GNN?