Fully-connected graphs

    Graph neural networks, transformers

    I am increasingly intrigued by the transformer architecture in neural nets. Although I have been aware of transformers since the inaugural 2017 paper (Vaswani, Attention Is All You Need  |  Mastodon discussion), my interest was recently piqued when by this comment in Towards Deep Learning for Relational Databases (local copy  |  Mastodon discussion):

      "this is also the representation at which the popular Transformers effectively operate - these, although technically taking sequences at input, turn them internally into fully connected graphs by assuming all pair-wise element relations within the (self-)attention module."

    Passionate about knowledge graphs, for a long time I have played with idea (Gedankenexperiment) of a fully-connected knowledge graph, in which the relations necessary to visualize a subnetwork "collapse" when viewed (queried or programmatically accessed). ๐Ÿค”

    The intersection of that Gedankenexperiment, fully-connected networks, and transformers appears in graph neural networks - a class of artificial neural networks for processing data that can be represented as graphs. That concept is discussed in Transformers are Graph Neural Networks (local copy | additional mentions: Mastodon discussin):

    • "The Transformer architecture is also extremely amenable to very deep networks, enabling the NLP community to scale up in terms of both model parameters and, by extension, data. Residual connections between the inputs and outputs of each multi-head attention sub-layer and the feed-forward sub-layer are key for stacking Transformer layers."

    • GNNs build representations of graphs

      "Graph Neural Networks (GNNs) or Graph Convolutional Networks (GCN) build representations of nodes and edges in graph data.

      "They do so through neighbourhood aggregation (or message passing), where each node gathers features from its neighbours to update its representation of the local graph structure around it.

      "Stacking several GNN layers enables the model to propagate each node's features over the entire graph - from its neighbours to the neighbours' neighbours, and so on."

    Neat! ๐Ÿ˜€

    Then, this morning (2023-08-03) I skimmed over Graph Structure from Point Clouds: Geometric Attention is All You Need. "ยง2.1 Constructing a Graph" caught my attention:

      "In natural language processing attention-based transformers treat sentences as graphs, where words are represented by nodes and are "fully connected" - that is, all nodes are connected to all other nodes.'

      "Much work has been done in applying machine learning techniques to point cloud problems, and in particular attention models, typically for 3D points.

      "We take as a case-study the problem of tagging jets of reconstructed particles as coming from either a top quark or a lighter hadronic particle.

      "In this case as in most point cloud problems, we are given only a set of points (herein called "nodes"), each with a feature vector, but without any notion of inter-node connections or relationships (herein called "edges").

      "To apply a GNN to these problems, there are two limiting approaches.

      • "The first is to treat the nodes as unconnected - that is, as a set. The DeepSets architecture has been used in jet tagging with, at the time, SotA results.

      • [THIS:] "The other limit is to to treat the point cloud as fully connected, and this is the approach taken in transformer models, such as the Particle Transformer, which outperforms the set-limit approach in top tagging, although with significant computational overhead.

      • "A happy medium is struck by ParticleNet, a model that applies a GNN to neighborhoods of K=16 neighbors and achieves very good results.

      "Given these three working points (unconnected, fully-connected, and sparsely connected), we therefore suggest that including graph structure benefits a model's predictive power, but that most node-pair connections are not relevant to the prediction task.

      [THIS] "The attention mechanism addresses exactly this hypothesis.

      "A multilayer perceptron (MLP), applied to pairs of nodes, learns which neighboring nodes carry relevant features and up-weight them in the message passing aggregation.

      "The catch-22 is that nodes must be connected somehow in order to apply the weighted aggregation.

      "The question of how to form edges we refer to as the Topology Problem. ..."

    #attention #FullyConnectedNetworkds #GNN #GranphNeuralNetworks #GraphTheory #KnowledgeGraphs #MachineLearning #ML #NaturalLanguageProcessing #networks #NeuralNetworks #NLP #NN #transformers

    Return to Persagen.com  |  @persagen@mastodon.social