2408.11042
Model: gemini-2.0-flash
# GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs
**Authors**: Florian Grötschla111Equal contribution., Joël Mathys, Christoffer Raun, Roger Wattenhofer
> ETH Zurich
Abstract
Many graph algorithms can be viewed as sets of rules that are iteratively applied, with the number of iterations dependent on the size and complexity of the input graph. Existing machine learning architectures often struggle to represent these algorithmic decisions as discrete state transitions. Therefore, we propose a novel framework: GraphFSA (Graph Finite State Automaton). GraphFSA is designed to learn a finite state automaton that runs on each node of a given graph. We test GraphFSA on cellular automata problems, showcasing its abilities in a straightforward algorithmic setting. For a comprehensive empirical evaluation of our framework, we create a diverse range of synthetic problems. As our main application, we then focus on learning more elaborate graph algorithms. Our findings suggest that GraphFSA exhibits strong generalization and extrapolation abilities, presenting an alternative approach to represent these algorithms.
\paperid
2307
1 Introduction
While machine learning has made tremendous progress, machines still have trouble generalizing concepts and extrapolating to unseen inputs. Large language models can write spectacular poems about traffic lights, but they still fail at multiplying two large numbers. They do not quite understand the multiplication algorithm since they do not have a good representation of algorithms. We want to teach machines some level of âalgorithmic thinking.â Given some unknown process, can the machine distill what is going on and then apply the same algorithm in another situation? This paper concentrates on one of the simplest processes: finite state automata (FSA). An FSA is a basic automaton that jumps from one state to another according to a recipe. FSAs are the simplest, interesting version of an algorithm. However, if we assemble many FSAs in a network, the result is remarkably powerful regarding computation. Indeed, the simple Game of Life is already Turing-complete.
Building on the work of Grattarola et al. [6] and Marr and HĂŒtt [17], this paper presents GraphFSA (Graph Finite State Automaton), a novel framework designed to learn a finite state automata on graphs. GraphFSA extracts interpretable solutions and provides a framework to extrapolate to bigger graphs, effectively addressing the inherent challenges associated with conventional methods. To better understand the capabilities of GraphFSA, we evaluate the framework on various cellular automata problems. In this controllable setting, we can test the modelâs abilities to extrapolate and verify that GraphFSA learns the correct rules. In addition, we introduce a novel dataset generator to create problem instances with known ground truth for the evaluation of GraphFSA models. Subsequently, we consider more elaborate graph algorithms to evaluate GraphFSA performance on more complex graph problems. Our paper makes the following key contributions:
- We introduce GraphFSA, an execution framework designed for algorithmic learning on graphs, addressing the limitations of existing models in representing algorithmic decisions as discrete state transitions and extracting explainable solutions.
- We present GRAB: The GR aph A utomaton B enchmark, a versatile dataset generator to facilitate the systematic benchmarking of GraphFSA models across different graph sizes and distributions. The code is available at https://github.com/ETH-DISCO/graph-fsa.
- We present Diff-FSA, one specific approach to train models for the GraphFSA framework and use GRAB to compare its performance to a variety of baselines, showcasing that our model offers strong generalization capabilities.
- We provide a discussion and visualization of GraphFSAâs functionalities and model variations. We provide insights into its explainability, limitations, and potential avenues for future work, positioning it as an alternative in learning graph algorithms.
2 Related work
Finite state automaton.
A Finite State Automaton (FSA), also known as a Finite State Machine (FSM), is a computational model used to describe the behavior of systems that operates on a finite number of states. It consists of a set of states, a set of transitions between these states triggered by input symbols from a finite alphabet, and an initial state. FSAs are widely employed in various fields, including computer science, linguistics, and electrical engineering, for tasks such as pattern recognition [16], parsing, and protocol specification due to their simplicity and versatility in modeling sequential processes. Additionaly, there is a growing interest in combining FSA with neural networks to enhance performance and generalization, a notion supported by studies by de Balle Pigem [1] on learning FSAs. Additional research by Mordvintsev [18] shows how a differentiable FSA can be learned.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Diagram: State Transition with FSA
### Overview
The image depicts a state transition diagram, showing a network of interconnected nodes that change state over time. The diagram is divided into three sections representing the state at time 't', a Finite State Automaton (FSA), and the state at time 't+1'. The nodes are colored blue, red, purple, and green, and are connected by lines indicating relationships or transitions. The FSA section shows transitions between states, with arrows indicating the direction of the transitions.
### Components/Axes
* **Nodes:** Represent states, colored blue, red, purple, and green.
* **Edges:** Represent relationships or transitions between states.
* **FSA (Finite State Automaton):** A boxed section showing state transitions with input/output.
* **Time Labels:** 't' and 't+1' indicate the state at different time points.
* **Validation Marks:** Checkmarks and crosses indicate validation results for each node.
* **Input/Output Indicators:** Small colored circles near nodes and transitions, representing input or output signals.
### Detailed Analysis
**Left Section (Time 't'):**
* Four nodes are present, colored blue, red, and purple.
* The purple nodes at the top are connected to the red node at the bottom.
* The blue node on the left is connected to the red node.
* The red node on the right is enclosed in a rectangle.
* Validation marks (checks and crosses) are present next to each node, indicating the validation status of each color.
* Top-left purple node: Blue - X, Red - Check, Purple - Check
* Top-right purple node: Blue - X, Red - Check, Purple - Check
* Bottom-left blue node: Blue - X, Red - Check, Purple - Check
* Bottom-right red node: Blue - Check, Red - X, Purple - Check
**Middle Section (FSA):**
* A box labeled "FSA" contains a state transition diagram.
* Nodes inside the FSA are colored blue, red, and purple.
* The green node is a double circle, indicating an accepting state.
* Arrows indicate transitions between states.
* Input/output indicators (small colored circles) are present near the transitions.
* Purple node at the top-left has an arrow to the blue node at the top-right. There are two circles above the arrow, one red and one purple.
* Purple node at the top-left has an arrow to the red node at the bottom-left. There is one red circle next to the arrow.
* Blue node at the top-right has an arrow to the red node at the bottom-left. There are two circles next to the arrow, one purple and one red.
* Red node at the bottom-left has an arrow to the green node at the bottom-right. There is one blue circle next to the arrow.
* Blue node at the top-right has an arrow looping back to the purple node at the top-left. There are two circles above the arrow, one blue and one purple.
* A dashed line connects the red node in the left section to the FSA.
* A small box with two circles (blue and purple) is connected to the red-to-green transition.
**Right Section (Time 't+1'):**
* Four nodes are present, colored blue, red, and green.
* The blue node at the top is connected to the red node at the bottom.
* The red node is connected to the green nodes.
* The green nodes are connected to each other.
### Key Observations
* The diagram shows a transition from a state at time 't' to a state at time 't+1', mediated by an FSA.
* The FSA defines the rules for state transitions based on input/output signals.
* The validation marks indicate the correctness of the state at time 't'.
### Interpretation
The diagram illustrates a system that evolves over time, with the FSA acting as a control mechanism. The state at time 't' is transformed into the state at time 't+1' based on the rules defined by the FSA. The input/output indicators suggest that the transitions are influenced by external signals. The validation marks at time 't' might represent the accuracy or reliability of the initial state. The diagram could represent a variety of systems, such as a control system, a decision-making process, or a biological process. The FSA acts as a filter or a transformation function, mapping the initial state to a new state based on specific rules and inputs.
</details>
Figure 1: Illustration of the GraphFSA framework: Each node has its own state, represented by its color. Furthermore, each node runs the same Finite State Automaton, determining its next state depending on the neighborhood information. On the left, an example graph is shown with the aggregation next to it. In this example, the aggregator can distinguish if there is a node of a certain color/state in the neighborhood. In general, the aggregator can be any function that maps a state multiset to a finite set of values. State transitions are determined by the FSA depicted in the middle, which chooses chooses the next state based on the states appearing in the neighborhood and the old state. An example of the transition that is taken by the red node on the bottom right of the graph is shown. Note that the green state is final and does not have any outgoing transitions.
Cellular automaton.
Cellular Automata (CA) are discrete computational models introduced by John von Neumann in 1967 von Neumann and Burks [29]. They consist of a grid of cells, where each cell is in a discrete state and evolves following transition rules based on the states of neighboring cells. In our context, learning these transition functions from data and modeling the appropriate domain is of particular interest. Notably, recent work by Wulff and Hertz [31] trains a Neural Network (NN) to learn 1D and 2D CAs, while further advancements are made with Neural Cellular Automata (NCA) introduced by Mordvintsev et al. [19], and the Graph Cellular Automata (GCA) proposed by Marr and HĂŒtt [17], operating on graph structures with diverse neighborhood transition rules. Additionally, Grattarola et al. [6] introduce Graph Neural Cellular Automata (GNCA), focusing on learning a GCA with continuous state spaces. GNCA mainly differs from our work in that it uses continuous instead of discrete state spaces and only learns a single transition step at a time, while we can extract rules from observations over multiple steps. Lastly, the study by Johnson et al. [11] investigates relationships defined by paths on a graph accepted by a finite-state automaton. Notably, this execution mechanism differs significantly from our methodology, serving the distinct purpose of adding edges on accepted paths. Unlike the synchronous execution of multiple finite-state automata in GraphFSA, here, the single automaton functions more like an agent traversing the graph. Consequently, the tasks the authors tackle, such as identifying additional edges in grid-world environments and abstract syntax trees (ASTs), differ from ours. It is crucial to highlight that this model only augments edges and does not predict node values.
Graph neural networks.
Graph Neural Networks (GNNs) were introduced by Scarselli et al. [23] and have gained significant attention. Various GNN architectures, such as Graph Convolutional Networks [14] and Graph Attention Networks [28], have been developed. A crucial area of GNN research is understanding their theoretical capabilities, which is linked to the WL color refinement algorithm that limits the conventional GNNsâ expressiveness [30, 32, 21]. The WL algorithm proceeds in rounds where node colors are relabeled based on their own color and the colors of nodes in the neighborhood. To achieve optimal expressiveness, message-passing GNNs must run the layers an adequate number of rounds to match the computational prowess of WL. The necessity of augmenting the number of rounds has been emphasized for algorithmic graph problems [15]. However, failing to adhere to the minimum rounds required may cause a GNN to fall short of solving or approximating specific problems, as shown by Sato et al. [22].
Furthermore, enhancing the interpretability and visualization of GNNs has become an essential part of GNN research. The development of GNNExplainer [33] marked a significant stride in generating explanations for Graph Neural Networks. Similarly, Hohman et al. [8] delved into visual analytics in deep learning, providing an interrogative survey for future developments in this domain. Another approach towards explainability is understanding specific examples proposed by Huang et al. [9]. Moreover, the GraphChef architecture introduced by MĂŒller et al. [20], offers a comprehensive understanding of GNN decisions by replacing layers in the message-passing framework with decision trees. GraphChef uses the gumbelized softmax from Johnson et al. [12] to learn a discrete state space, enabling the extraction of decision trees.
Algorithmic learning.
Algorithm learning aims to learn an underlying algorithmic principle that allows generalization to unseen and larger instances. As such, one of the primary goals is to achieve extrapolation. Various models have been investigated for extrapolation, including RNNs capable of generalizing to inputs of varying lengths [5]. Specifically, in our work, we want to learn graph algorithms. For extrapolation, we want our learned models to scale to graphs that are more complex and larger than those used during training. Graph Neural Networks (GNNs) have been extensively applied to solve these algorithmic challenges [24, 27, 13]. Recent studies by Tang et al. [25], VeliÄkoviÄ et al. [26], and Ibarz et al. [10] have emphasized improving extrapolation to larger graphs in algorithmic reasoning problems. However, most of the work focuses only on slightly larger graphs than those in the training set. Scaling to much larger graphs, as studied by Grötschla et al. [7] remains challenging.
3 The GraphFSA framework
We present GraphFSA, a computational framework designed for executing Finite State Automata (FSA) on graph-structured data. Drawing inspiration from Graph Cellular Automata and Graph Neural Networks, GraphFSA defines an FSA at each node within the graph. This FSA remains the same across all nodes and encompasses a predetermined set of states and transition values. While all nodes abide by the rules of the same automaton, the nodes are usually in different states. As is customary for FSAs, a transition value and the current state jointly determine the subsequent state transition. In our approach, transition values are computed through the aggregation of neighboring node states. As the FSA can only handle a finite number of possible aggregations, we impose according limitations on the aggregation function. For the execution of the framework, nodes are first initialized with a state that represents their input features. The FSA is then executed synchronously on all nodes for a fixed number of steps. Each step involves determining transition values and executing transitions for each node to change their state. Upon completion, each node reaches an output state, facilitating node classification into a predetermined set of categorical values. A visual representation of the GraphFSA model is depicted in Figure 1.
3.1 Formal definition
More formally, the GraphFSA $\mathcal{F}$ consists of a tuple $(\mathcal{M},\mathcal{Z},\mathcal{A},\mathcal{T})$ . $\mathcal{F}$ is applied to a graph $G=(V,E)$ and consists of a set of states $\mathcal{M}$ , an aggregation $\mathcal{A}$ and a transition function $\mathcal{T}$ . At time $t$ , each node $vâ V$ is in state $s_{v,t}â\mathcal{M}$ . In its most general form, the aggregation $\mathcal{A}$ maps the multiset of neighboring states to an aggregated value $aâ\mathcal{Z}$ of a finite domain.
$$
\displaystyle a_{v,t}=\mathcal{A}(\{\!\{s_{u,t}\mid u\in N(v)\}\!\}) \tag{1}
$$
Here $\{\!\{\}\!\}$ denotes the multiset and $N(v)$ the neighbors of $v$ in $G$ . At each timestep $t$ , the transition function $\mathcal{T}:\mathcal{M}Ă\mathcal{Z}â\mathcal{M}$ takes the state of a node $s_{v,t}$ and its corresponding aggregated value $a_{v,t}$ and computes the state for the next timestep.
$$
\displaystyle s_{v,t+1}=\mathcal{T}(s_{v,t},a_{v,t}) \tag{2}
$$
For notational convenience, we also define the transition matrix $T$ of size $\left|\mathcal{M}\right|Ă\left|\mathcal{Z}\right|$ where $T_{m,a}$ stores $\mathcal{T}(m,a)$ . Moreover, we introduce the notion of a state vector $s_{v}^{t}$ for each node $vâ V$ at time $t$ , which is a one-hot encoding of size $\left|\mathcal{M}\right|$ of the nodeâs current state.
Aggregation functions.
The transition value $a$ for node $v$ at time $t$ is directly determined by aggregating the multi-set of states from all neighboring nodes at time $t$ . The aggregation $\mathcal{A}$ specifies how the aggregated value is computed from this neighborhood information. Note that this formulation of the aggregation $\mathcal{A}$ allows for a general framework in which many different design choices can be made for a concrete class of GraphFSAs. Throughout this work, we will focus on the counting aggregation and briefly touch upon the positional aggregation.
Counting aggregation: The aggregation aims to count the occurrence of each state in the immediate neighborhood. However, we want the domain $\mathcal{Z}$ to remain finite. Note that due to the general graph topology of $G$ , the naive count could lead to $\mathcal{Z}$ growing with $n$ the number of nodes. Instead, we take inspiration from the distributed computing literature, specifically the Stone Age Computing Model by Emek and Wattenhofer [3]. Here, the aggregation is performed according to the one-two-many principle, where each neighbor can only distinguish if a certain state appears once, twice, or more than twice in the immediate neighborhood. Formally, we can generalize this principle using a bounding parameter $b$ , which defines up to what threshold we can exactly count the neighbor states. The simplest mode would use $b=1$ , i.e., where we can distinguish if a state is part of the neighborhood or not. Note that this is equivalent to forcing the aggregation to use a set instead of a multi-set. For the general bounding parameter $b$ , we introduce the threshold function $\sigma$ , which counts the occurrences of a state in a multi-set.
$$
\displaystyle\sigma(m,S)=\min(b,\left|\{\!\{s\mid s=m,s\in S\}\!\}\right|) \tag{3}
$$
Therefore, the aggregated value $a_{v,t}$ will be an $\left|\mathcal{M}\right|$ dimensional vector of the following form, where the $m$ -th dimension corresponds to the number of occurrences of state $m$ .
$$
\displaystyle(a_{v,t})_{m}=\sigma(m,\{\!\{s_{u,t}\mid u\in N(v)\}\!\}) \tag{4}
$$
Note that there is a tradeoff between expressiveness and model size when choosing a higher bounding parameter $b$ . For a model with $|\mathcal{M}|$ number of states there are ${(b+1)}^{|\mathcal{M}|}$ many transition values for each state $mâ\mathcal{M}$ .
Positional aggregation: This aggregation function takes into account the spatial relationship between nodes in the graph. If the underlying graph has a direction associated with each neighbor, i.e., in a grid, we assign each neighbor to a distinct slot within a transition value vector $a_{v,t}$ . This generalizes to graphs of maximum degree $d$ , where each of the d entries in $a_{v,t}$ corresponds to a state of a neighboring node.
Average threshold-based aggregation An alternative aggregation operator utilizes the average value, which allows us to detect whether a specific threshold $t$ of neighbors is in a particular state. We compute the value to threshold by dividing the state sum by the neighborhood size of a node v. That, for example, allows us to detect whether the majority (e.g., 0.5) of neighbors are in a specific state.
Starting and final states.
We use starting states $Sâeq\mathcal{M}$ of the FSA to encode the discrete set of inputs to the graph problem. These could be the starting states of a cellular automaton or the input to an algorithm (e.g., for BFS, the source node starts in a different state from the other nodes). Final states $Fâeq\mathcal{M}$ are used to represent the output of a problem. In node prediction tasks, we choose one final state per class. Opposed to other states, it is not possible to transition away from a final state, meaning that once such a state is reached, it will never change.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Diagram: Network Node Configuration
### Overview
The image depicts three network node configurations, each consisting of interconnected circles representing nodes. The configurations vary in the coloring of the nodes, suggesting different states or roles within the network.
### Components/Axes
* **Nodes:** Represented by circles.
* **Connections:** Represented by lines connecting the nodes.
* **Colors:** White, light blue, light purple, light red, and light green.
### Detailed Analysis
**Configuration 1 (Left):**
* All nodes are white with black outlines.
* Two central nodes are connected by a vertical line.
* The top central node has three nodes connected to it.
* The bottom central node has four nodes connected to it.
**Configuration 2 (Center):**
* Two central nodes are light blue and connected by a vertical line.
* The top central node has three light purple nodes connected to it.
* The bottom central node has four light purple nodes connected to it.
**Configuration 3 (Right):**
* The top central node is light blue.
* The bottom central node is light red.
* The top central node has three light purple nodes connected to it.
* The bottom central node has three light green nodes connected to it, and one light green node connected below.
### Key Observations
* The basic structure of the network is consistent across all three configurations.
* The coloring of the nodes changes, indicating a potential change in state or role.
* The number of connections per node varies, with the central nodes having more connections.
### Interpretation
The diagram likely illustrates different states or configurations of a network. The color-coding could represent different types of nodes, their status (e.g., active, inactive, error), or their role within the network (e.g., server, client). The first configuration might represent an initial state, while the subsequent configurations show how the network evolves or adapts. The change in color from white to blue/red/purple/green suggests a transition from an uninitialized state to a state with defined roles or statuses. The number of connections could indicate the importance or load on a particular node.
</details>
Figure 2: Original topology (left), nodes that GraphFSA can distinguish with 2+ aggregation (i.e., distinguish neighbors as 0, 1 or more than 2) (middle) and 1-WL color classes (right). More specifically, for GraphFSA, nodes cannot differentiate if they have 2 or 3 neighbors of a kind. In contrast, 1-WL can further distinguish the two nodes.
Scalability.
Scalability can be analyzed along two primary dimensions: the size of the graphs on which the cellular automata (CA) are executed, and the number of states that the learned CA can possess. Our approach demonstrates good scalability for larger graphs as it operates in linear time relative to the number of nodes and edges. However, scalability concerning the number of states is influenced by the chosen aggregation method. Specifically, with the counting aggregation, the number of possible aggregations becomes exponential in the number of states. It is, therefore, crucial to choose aggregations that lead to a smaller expansion when dealing with more states.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Diagram: Finite State Automaton
### Overview
The image depicts a finite state automaton (FSA) with two states, 's1' and 'f0'. The automaton transitions from state 's1' to state 'f0' upon receiving the input '[0, 1, 0, 0]'. State 's1' also has a self-loop transition upon receiving the input '[0, 0, 1, 0]'. State 'f0' is a final (accepting) state, indicated by the double circle. The right side of the image is shaded gray.
### Components/Axes
* **States:**
* 's1': Initial state, represented by a single circle. Located on the left side of the diagram.
* 'f0': Final state, represented by a double circle. Located on the right side of the diagram, in the gray shaded area.
* **Transitions:**
* From 's1' to 'f0': Labeled with '[0, 1, 0, 0]'.
* Self-loop on 's1': Labeled with '[0, 0, 1, 0]'.
* **Input Symbols:** The input symbols are represented as vectors of four binary digits.
### Detailed Analysis or ### Content Details
* **State 's1':**
* A transition labeled '[0, 0, 1, 0]' loops back to 's1'. The arrow indicates the direction of the transition.
* A transition labeled '[0, 1, 0, 0]' leads to state 'f0'. The arrow indicates the direction of the transition.
* **State 'f0':**
* 'f0' is a final state, indicated by the double circle. There are no outgoing transitions from 'f0' shown in the diagram.
### Key Observations
* The automaton starts in state 's1'.
* The automaton accepts the input sequence '[0, 1, 0, 0]' by transitioning to the final state 'f0'.
* The automaton can remain in state 's1' indefinitely by repeatedly receiving the input '[0, 0, 1, 0]'.
### Interpretation
The diagram represents a simple finite state automaton that recognizes a specific input sequence. The automaton transitions from an initial state to a final state upon receiving the input '[0, 1, 0, 0]'. The self-loop on state 's1' allows the automaton to remain in that state while receiving the input '[0, 0, 1, 0]'. The gray shading on the right side of the image does not appear to have any functional significance within the FSA itself.
</details>
<details>
<summary>x4.png Details</summary>

### Visual Description
## Diagram: Finite State Automaton
### Overview
The image depicts a finite state automaton (FSA) diagram. It consists of four states represented by circles, labeled as "s1", "s0", "s2", and "s3". The states are connected by directed edges (arrows) labeled with binary strings, indicating the transitions between states based on input. States "s2" and "s3" are accepting states, indicated by double circles.
### Components/Axes
* **States:**
* s1 (single circle) - Start state (located on the left)
* s0 (single circle) - Intermediate state (located in the center)
* s2 (double circle) - Accepting state (located on the right)
* s3 (double circle) - Accepting state (located at the bottom center)
* **Transitions:** Directed arrows connecting the states, labeled with binary strings.
* **Labels:** Binary strings labeling the transitions, such as "|0,0,0,1|", "|0,0,1,0|", "|0,1,0,1|", "|0,1,0,0|", "|1,0,1,0|", "|1,0,0,0|", "|1,0,0,1|".
### Detailed Analysis
* **State s1:**
* Transitions to s0 with input "|0,0,0,1|".
* Transitions to s0 with input "|0,0,1,0|".
* Transitions to s1 with input "|0,1,0,1|".
* Transitions to s3 with input "|0,1,0,0|".
* **State s0:**
* Transitions to s2 with input "|1,0,1,0|".
* Transitions to s2 with input "|0,0,0,1|".
* **State s2:**
* Transitions to s2 with input "|0,0,1,1|".
* **State s3:** No outgoing transitions.
### Key Observations
* States s2 and s3 are accepting states.
* State s1 has self-loop transition.
* State s3 has no outgoing transitions.
### Interpretation
The diagram represents a finite state automaton that accepts strings based on the transitions between states. The automaton starts at state s1. If the automaton reaches either state s2 or s3, the input string is accepted. The binary strings on the transitions define the input sequences that cause the automaton to move from one state to another. The automaton appears to recognize specific patterns of binary inputs, leading to acceptance or rejection of the input string.
</details>
Figure 3: Partial Visualization of a learned GraphFSA model for the Distance problem. The root node starts in state $s_{1}$ (left), whereas all other nodes start in state $s_{0}$ (right). The final states $f_{0},f_{1}$ represent even and odd distances to the root, respectively. Aggregation values are presented as $[f_{0},f_{1},s_{0},s_{1}]$ where we apply the counting aggregation with bounding parameter $b=1$ . We can verify on the right that the nodes wait until they observe a final state in their neighborhood and then transition to the other final state. A full visualization of the same automaton can be found in the Appendix.
3.2 Expressiveness
While the execution of GraphFSA may bear resemblance to one round of the Weisfeiler-Lehman test, this does not hold in general. In the example Figure 1, they are the same because the chosen GraphFSA-instantiation is powerful enough to distinguish all occurrences of states appearing in the neighborhoods. More specifically, nodes can distinguish if they have 0 or more than one neighbor in a certain state (â1+ aggregationâ), which is sufficient for the given graph. In general, the aggregation function $\mathcal{A}$ distinguishes GraphFSA from the standard WL-refinement notion as it restricts the observation of the neighborhood from the usual multiset observations (as these cannot be bounded in size). Consequently, GraphFSA is strictly less expressive than 1-WL, resulting in a tradeoff between using a more expressive aggregator and maintaining the simplicity or explainability of the resulting model, particularly as the state space grows exponentially. Figure 2 presents an example comparing GraphFSAâs execution with a 2+ aggregator to WL refinement. Here, we can observe that GraphFSA is less expressive than 1-WL. However, for a large enough $b$ , GraphFSA can match 1-WL on restricted graph classes of bounded degree. In conclusion, while the finite number of possible aggregations restricts GraphFSA, it also keeps the mechanism simple. Our evaluation demonstrates that this can improve generalization performance for a learned GraphFSA.
3.3 Visualization and interpretability
The GraphFSA framework offers inherent interpretability, facilitating visualization akin to a Finite State Automaton (FSA). This visualization approach allows us to easily explore the learned state transitions and the influence of neighboring nodes on state evolution. We outline two primary visualization techniques for GraphFSA:
Complete FSA visualization.
This method provides a comprehensive depiction of the entire FSA representation within the GraphFSA model, encompassing all states and transitions. Each node in the graph corresponds to a distinct state, while directed edges represent transitions between states based on transition values. An example is visualized in Figure 5.
Partial FSA visualization.
Tailored for targeted analysis of the GraphFSA modelâs behavior, this approach proves especially useful when dealing with large state spaces or focusing on specific behaviors. This method visualizes a selected starting state by evaluating the model across multiple problem instances and tracking all transitions that occur. Then, the visualization is constructed by exclusively displaying the states and transitions that have been used from the specified starting state.
Figure 3 illustrates the partial visualization of a learned automaton. The transitions in the diagram are determined by aggregating the information of the neighboring states. The structure of the transition value is $[f_{0},f_{1},s_{0},s_{1}]$ . For instance, the state vector $[0,0,1,0]$ indicates that the node in the graph has one or more neighbors in state $s_{0}$ while having no neighbors in other states.
<details>
<summary>extracted/5781695/graphics/experiments/ca/hexagonial_input.png Details</summary>

### Visual Description
## Hexagonal Grid: Alternating Pattern
### Overview
The image shows a grid composed of hexagonal tiles arranged in an alternating pattern of two colors: black and light gray. The grid is roughly square in shape, with some hexagons partially visible at the edges.
### Components/Axes
* **Tiles:** Individual hexagonal shapes.
* **Colors:** Black and light gray.
* **Arrangement:** Hexagons are arranged in a grid-like structure, with each hexagon surrounded by up to six other hexagons.
### Detailed Analysis
The grid consists of 30 hexagons. The hexagons are arranged in 5 rows.
* **Row 1:** Contains 5 hexagons, alternating light gray, black, light gray, black, light gray.
* **Row 2:** Contains 6 hexagons, alternating black, light gray, black, light gray, black, light gray.
* **Row 3:** Contains 6 hexagons, alternating light gray, black, light gray, black, light gray, black.
* **Row 4:** Contains 6 hexagons, alternating black, light gray, black, light gray, black, light gray.
* **Row 5:** Contains 5 hexagons, alternating light gray, black, light gray, black, light gray.
### Key Observations
The pattern is a repeating arrangement of black and light gray hexagons. The pattern is offset in each row, creating a visually interesting and tessellated effect.
### Interpretation
The image represents a simple, repeating pattern using hexagonal tiles. This type of pattern is commonly found in various applications, such as game boards, decorative designs, and scientific visualizations. The alternating colors create a clear visual distinction between adjacent tiles. The arrangement of the hexagons creates a tessellated surface, meaning that the shapes fit together without any gaps or overlaps.
</details>
<details>
<summary>extracted/5781695/graphics/experiments/ca/hexagonial_output.png Details</summary>

### Visual Description
## Hexagonal Grid Pattern
### Overview
The image shows a grid composed of hexagonal tiles arranged in a pattern. Some tiles are colored black, while others are light gray. The arrangement of the black tiles forms a distinct shape or design against the gray background.
### Components/Axes
* **Tiles:** Individual hexagonal shapes.
* **Colors:** Black and light gray.
* **Arrangement:** The tiles are arranged in rows and columns, creating a grid-like structure.
### Detailed Analysis
The grid consists of 30 hexagonal tiles arranged in a roughly square formation. The tiles are arranged in 5 rows. The number of tiles in each row, from top to bottom, is 6, 7, 6, 6, and 5.
The black tiles are arranged to form a specific pattern.
* **Row 1:** Tiles 3, 4, and 5 are black.
* **Row 2:** Tiles 2, 3, 4, and 5 are black.
* **Row 3:** Tile 3 is gray, tile 4 is black, and tile 5 is gray.
* **Row 4:** Tile 2 is gray, tile 3 is black, tile 4 is gray, and tile 5 is black.
* **Row 5:** Tiles 2, 3, and 4 are black.
### Key Observations
The black tiles form a connected shape that spans multiple rows. The pattern is symmetrical along a vertical axis.
### Interpretation
The image presents a visual pattern created by alternating black and gray hexagonal tiles. The arrangement of the black tiles creates a distinct shape, possibly representing a symbol, letter, or abstract design. The pattern could be used for various purposes, such as decorative art, game board design, or visual encoding of information.
</details>
<details>
<summary>x5.png Details</summary>

### Visual Description
## Diagram: Two-Row Simulation of a Process
### Overview
The image shows a simulation across two rows, depicting the state of a process at four different steps: Step 0, Step 3, Step 6, and Step 9. Each step shows two parallel rows. The simulation involves movement of colored blocks (blue and red) along a yellow path within a black background.
### Components/Axes
* **Titles:** Step 0, Step 3, Step 6, Step 9 (placed above each column)
* **Environment:** Black background
* **Path:** Yellow horizontal path with a square-shaped obstacle in the middle.
* **Movable Blocks:** Blue and Red blocks.
### Detailed Analysis
**Step 0:**
* Both rows show a blue block at the left end of the yellow path.
**Step 3:**
* In both rows, the blue block has moved to the right, and a red block appears to its left. The red and blue blocks are adjacent.
**Step 6:**
* In both rows, the red and blue blocks have moved further right and are now positioned vertically, blocking the yellow path. The red block is above the blue block.
**Step 9:**
* In both rows, the red and blue blocks have moved further to the right, past the central square obstacle, with the red block to the left and the blue block to the right.
### Key Observations
* The simulation shows the progression of blue and red blocks along a defined path.
* The blocks move together, maintaining a relative position.
* At Step 6, the blocks are oriented vertically, blocking the path.
### Interpretation
The diagram illustrates a process where two components (represented by the blue and red blocks) move along a path, interact with an obstacle, and continue moving. The simulation demonstrates a state change at Step 6, where the components temporarily block the path. The two rows likely represent parallel or redundant processes, as they exhibit identical behavior at each step.
</details>
Figure 4: We study learning cellular automata as the simplest form of algorithmic behavior, which can lead to complex and intricate behavior patterns when assembled in a network. The left shows Conwayâs Game of Life, which we can learn on general topologies such as the hexagonal grid due to the graph representation. A WireWorld automata is depicted on the right, which can be used to simulate transistors and electrical circuits. It shows the electron head (blue) and tail (red) transitioning through different wires (yellow).
3.4 GRAB: The GraphFSA dataset generator
Accompanying the GraphFSA execution framework, which outlines the general interface and underlying principles shared amongst instances of GraphFSA, we provide GRAB, a flexible dataset generator to evaluate learned state automatons. Note that the introduced framework specifies the model structure but allows for different training methodologies that can be used to train specific model instances. The discrete nature of state transitions makes training these models a challenging task. This work proposes one possible way to train such models using input and output pairs. For alternative approaches, we deem it crucial to provide a systematic tool that can be used to develop new training strategies and further provide a principled way of thoroughly assessing the effectiveness across training methods. Therefore, GRAB provides an evaluation framework for learning state automatons on graphs. GRAB generates the ground truth for a synthetic FSA, which can be used for learning and testing on various graphs. In particular, GRAB provides out-of-the-box support for several graph sizes and supports testing on multiple graph distributions.
In the dataset construction process, we first define the characteristics of the ground truth GraphFSA, specifying the number of states and configuring the number of initial and final states. Next, we generate a random Finite State Automaton (FSA) by selecting states for each transition value, adhering to the constraints of the final state set (final states cannot transition to other states). We then sample graphs from predefined distributions, offering testing on a diverse set of graph types such as trees, grids, fully connected graphs, regular graphs, and Erdos-Renyi graphs. The starting states for each node within these generated graphs are randomly initialized from the set $\mathcal{S}$ . Finally, we apply the GraphFSA model to these input graphs to compute the outputs according to the defined state transitions. For the dataset generation, we emphasize evaluating generalization, which we believe to be one of the essential abilities of the GraphFSA framework. Therefore, we establish an extrapolation dataset to assess the modelâs capability beyond the graph sizes encountered during training. This dataset contains considerably larger graphs, representing a substantial deviation from the training data. However, if the correct automaton can be extracted during training, good generalization should be achievable despite the shift in the testing distribution. Input/output pairs for these larger graphs are produced by applying the same generated FSA to obtain the resulting states.
4 Empirical evaluation
The GraphFSA framework is designed to extract state automaton representations with discrete states and transitions to learn to solve the task at hand through an algorithmic representation. Our evaluation consists of two key components to showcase this. In the first part, we focus on classic cellular automaton problems. These automatons serve as a foundational component of our study as they represent the simplest possible version of an algorithm. However, despite their simplicity, they can becomes increasingly powerful and exhibit complex behaviour when assembled together in a network. We successfully learn the underlying automaton rules, demonstrating the ability and advantage of the GraphFSA framework to capture simple algorithmic behaviour.
In the second part of our empirical study, we evaluate our models using our proposed graph automaton benchmark and a set of more challenging graph algorithms. GRAB enables a comprehensive investigation of GraphFSAâs performance across a broad range of problems. We are particularly interested in the generalisation ability of the GraphFSA Framework. Therefore, we perform extrapolation experiments where we train on smaller graphs and subsequently test its performance on larger instances. Finally, we use the GraphFSA framework to learn a selection of elementary graph algorithms, demonstrating the frameworkâs capability and potential for algorithm learning.
4.1 Training
We take inspiration from prior research on training differentiable finite state machines [18] and propose an adaption to train Diff-FSA, which follows the GraphFSA framework, on general graphs. To maintain differentiability, we relax the requirement for choosing a single next state for a transition in $\mathcal{T}$ during training and instead model it as a probability distribution over possible next states, resulting in a probabilistic automaton. We thus compute $P(X=m\ |\ \mathcal{M}Ă\mathcal{Z})$ for $mâ\mathcal{M}$ and parameterize Diff-FSA with a matrix $T^{\prime}$ of size $\mathcal{M}Ă\mathcal{Z}Ă\mathcal{M}$ that holds the probabilities for all possible transitions and states. This means that the transition matrix $T^{\prime}$ contains the probabilities of transitioning from a state $m_{1}$ to a state $m_{2}$ given a specific transition value. To train the model with a given step number $t$ , we execute Diff-FSA for $t$ rounds and compute the loss based on the ground-truth output states of the graph automaton we want to learn when executing the same number of steps $t$ . For example, in the case of node classification problems, we aim to predict each nodeâs final state $fâ F$ . This makes it possible to backpropagate through the transition matrix $T^{\prime}$ . In addition, we apply an L2 loss to achieve the correct mapping between input and output states. Once the training of the probabilistic model is complete, we extract the decision matrix $T$ from $T^{\prime}$ by discretizing the distribution $P$ over the next states to the most likely one.
Furthermore, we are interested in the modelâs ability to produce consistent and correct outputs, even as the number of iterations changes. We refer to this as Iteration stability. The extrapolation capabilities of the GraphFSA model are achieved through the ability to adjust the number of iterations, representing the number of steps executed by the FSA. This flexibility is essential, especially for graphs with larger diameters, where information needs to propagate to all nodes. We can, therefore, train an FSA where we can choose a sufficiently large number of iterations to produce a correct output, and increasing this number of iterations still leads to a correct solution. We develop two approaches:
Random iteration offset.
During training, we introduce random adjustments to the number of iterations with a small offset factor. This randomization ensures that the model can effectively handle the problem with varying numbers of iterations.
Final state loss.
One approach to incentivize final states not to switch to other states is to incorporate an additional loss term that penalizes leaving a final state. We consider the learned transition matrix probabilities and penalize the model if the probability of switching from a final state $fâ F$ to any other state $mâ\mathcal{M},mâ f$ is greater than 0 using an L1 loss. This approach encourages the model to stay in the final state, resulting in more stable predictions. Figure 3 provides an example of a state automata trained with this additional loss.
Baseline models.
The architecture that aligns most closely with our tasks is the Graph Neural Cellular Automata (GNCA) [6]. While GNCAs are designed to learn rules from single-step supervision, we apply several rounds when training with multiple steps. Besides GNCAs, we use recurrent message-passing GNNs. In contrast to GNCA, we follow the typical encode-process-decode [27], which means that the GNN keeps intermediate hidden embeddings that can be decoded to make predictions. We use the architecture proposed by Grötschla et al. [7], and observe that employing a GRU in the node update yields the best performance and generalization capabilities, in line with their results. This convolution is therefore used in all results in the paper. As the goal of a GraphFSA is to extract rules from the local neighborhood and update the node state purely based on this information, more advanced graph-learning architectures that go beyond this simple paradigm are not applicable in this setting. Our analysis mainly focuses on demonstrating the value of discrete state spaces, especially when generalizing to longer rollouts and larger graphs.
4.2 Classical cellular automata problems
As a first step towards learning algorithms, we focus on cellular automata probelms. We consider a variety of different settings such as 1D and 2D automata, these include more well known instances such as Wireworld or Game of Life, which despite their simple rules are known to be Turing complete. For a more detailed explanation of the datasets, we refer to the Appendix.
1D-Cellular Automata
The 1D-Cellular Automata dataset consists of one-dimensional cellular automata systems, each defined by one of the possible $2^{2^{3}}$ rules. Each rule corresponds to a distinct mapping of a cellâs state and its two neighbors to a new state.
Game Of Life.
The GameOfLife dataset follows the rules defined by Conwayâs Game of Life [4]. A cell becomes alive if it has exactly 3 live neighbors. If a cell has less than 2 or more than 4 live neighbors, it dies, otherwise it persists. Furthermore, we consider a a variation where cells are hexagonal instead of the traditional square grid, which we refer to as the Hexagonal Game of Life. A visual representation is depicted in Figure 4.
<details>
<summary>x6.png Details</summary>

### Visual Description
## State Transition Diagram: Finite State Machine
### Overview
The image depicts a state transition diagram for a finite state machine. It shows three states (s0, s1, f0, f1) and the transitions between them, labeled with binary input strings. States f0 and f1 are accepting states, indicated by the double circles.
### Components/Axes
* **States:**
* s0: Initial state, represented by a single circle.
* s1: Intermediate state, represented by a single circle.
* f0: Accepting state, represented by a double circle.
* f1: Accepting state, represented by a double circle.
* **Transitions:** Arrows connecting the states, labeled with binary strings (e.g., "01010", "00011"). These strings represent the input that causes the transition from one state to another.
### Detailed Analysis
* **State s0:**
* Transitions to s0: [01011], [01110]
* Transitions to s1: [01000], [00100], [00111]
* Transitions to f1: [01111], [00001], [01101], [01000], [01100], [00101], [01001]
* Transitions to f0: [01010], [00011], [00110], [00000], [00010]
* **State s1:**
* Transitions to s0: [00111], [00000]
* Transitions to s1: [00100]
* Transitions to f0: [00110], [00010], [01110], [01010], [00011], [01011]
* Transitions to f1: [00001], [01101], [01100], [00101], [01001], [01111]
* **State f0:**
* No outgoing transitions.
* **State f1:**
* No outgoing transitions.
### Key Observations
* States f0 and f1 are accepting states, and once the machine enters these states, it remains there.
* State s0 has transitions to all other states.
* State s1 has transitions to all other states.
* The binary strings labeling the transitions are all 5 bits long.
### Interpretation
The diagram represents a finite state machine that accepts strings based on the transitions between states. The machine starts in state s0 and transitions between states based on the input strings. If the machine ends in either state f0 or f1 after processing an input string, the string is accepted. The specific language accepted by this machine depends on the set of all possible paths from the initial state s0 to the accepting states f0 and f1. The diagram shows the possible transitions for a given input, but it does not specify the complete language accepted by the machine.
</details>
Figure 5: Complete Visualization of a learned GraphFSA model for the Distance problem. The root node starts in state $s_{1}$ , whereas all other nodes start in state $s_{0}$ . The final states $f_{0},f_{1}$ represent even and odd distances to the root respectively. Aggregation values are presented as $[f_{0},f_{1},s_{0},s_{1}]$ where we apply the counting aggregation with bounding parameter $b=1$ .
WireWorld.
WireWorld [2] is an automaton designed to simulate the flow of an electron through a circuit. Cells can take one of four states: empty, electron head, electron tail, and conductor. Figure 4 shows an initial grid starting state and the evolution of WireWorld over multiple time steps.
4.2.1 Results
For Cellular Automata problems, we generate training data for each problem by initializing a 4x4 input grid or path of length 4 (for 1D problems) with random initial states and apply the problemâs rules for a fixed number of steps to produce the corresponding output states. Our models are trained on these input-output pairs, using the same number of iterations as during dataset creation to ensure that we can extract the rule. Additionally, we test on an extrapolation dataset consisting of larger 10x10 grids. This dataset is used to evaluate the modelâs ability to adapt to varying rule iterations with four different iteration numbers.
1D Cellular Automata.
On this dataset, Diff-FSA uses the âpositional neighbor-informed aggregationâ aggregation technique. Notably, our model can successfully learn the rules for one-step training data (t=1). For the complete results, we refer to the Appendix.
2D Cellular Automata.
For 2D Cellular Automata (CA) problems, Diff-FSA employs the counting aggregation. For Wireworld, we use a state space of four and a bounding parameter of three. We train all models to learn the one-step transition rules. During training, all models and baselines report high accuracy. We then investigate the generalization capabilities and especially the iteration stability of the learned models. We run the trained models on larger 10xs10 grids for more time steps t than during training. The results are depicted in Figure 6. Note that the recurrent GNN and GNCA both deteriorate outside the training distribution. However, Diff-FSA exhibits good iteration stability across the whole range of iterations. For the Game of Life variations, we provide a detailed comparison in the Appendix.
<details>
<summary>x7.png Details</summary>

### Visual Description
## Line Chart: Model Accuracy vs. Time
### Overview
The image is a line chart comparing the accuracy of three different models (Recurrent GNN, Diff-FSA, and GNCA) over time. The x-axis represents time ('t'), ranging from 0 to 100. The y-axis represents accuracy, ranging from 0.0 to 1.0. Each model's performance is represented by a line, with shaded areas indicating the variance or uncertainty in the accuracy.
### Components/Axes
* **X-axis:**
* Label: "t"
* Scale: 0 to 100, with tick marks at 0, 20, 40, 60, 80, and 100.
* **Y-axis:**
* Label: "Accuracy"
* Scale: 0.0 to 1.0, with tick marks at 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0.
* **Legend (bottom-right):**
* Title: "Model"
* Entries:
* Blue line: "Recurrent GNN"
* Orange line: "Diff-FSA"
* Green line: "GNCA"
### Detailed Analysis
* **Recurrent GNN (Blue):**
* Trend: Starts at approximately 0.3, peaks around t=5 at approximately 0.4, then decreases and stabilizes around 0.3 with some fluctuations.
* Data Points:
* t=0: Accuracy â 0.3
* t=5: Accuracy â 0.4
* t=20: Accuracy â 0.3
* t=40: Accuracy â 0.3
* t=60: Accuracy â 0.3
* t=80: Accuracy â 0.3
* t=100: Accuracy â 0.3
* **Diff-FSA (Orange):**
* Trend: Constant accuracy of 1.0 across all time points.
* Data Points:
* t=0 to t=100: Accuracy = 1.0
* **GNCA (Green):**
* Trend: Starts at approximately 1.0, decreases rapidly to approximately 0.7 by t=5, then gradually decreases and stabilizes around 0.6.
* Data Points:
* t=0: Accuracy â 1.0
* t=5: Accuracy â 0.7
* t=20: Accuracy â 0.6
* t=40: Accuracy â 0.6
* t=60: Accuracy â 0.6
* t=80: Accuracy â 0.6
* t=100: Accuracy â 0.6
### Key Observations
* Diff-FSA consistently achieves perfect accuracy (1.0) throughout the entire time range.
* GNCA starts with high accuracy but quickly drops and stabilizes at a lower level.
* Recurrent GNN has the lowest accuracy and remains relatively constant over time.
* The shaded regions around the lines indicate the variance in accuracy for each model. GNCA has the largest variance initially, which decreases over time. Recurrent GNN has a smaller, relatively constant variance. Diff-FSA has no variance, as its accuracy is always 1.0.
### Interpretation
The chart demonstrates the performance of three different models over time, measured by accuracy. Diff-FSA is the most accurate model, achieving perfect accuracy. GNCA shows a rapid initial decline in accuracy before stabilizing. Recurrent GNN has the lowest and most consistent accuracy. The shaded regions provide insight into the stability and reliability of each model's performance. The data suggests that Diff-FSA is the superior model for this particular task, while Recurrent GNN may not be suitable. GNCA's initial performance is promising, but its rapid decline raises concerns about its long-term effectiveness.
</details>
Figure 6: Mean accuracies for Wireworld on a regular grid when trained on 1 step and applied for $t$ steps during inference (for 10 seeds each). All models report high accuracy for $t=1$ . However, performance deteriorates during inference when more steps are executed except for Diff-FSA.
4.3 Evaluation with GRAB
Table 1: Evaluation of GraphFSA on synthetic data provided by GRAB. We report the accuracy and standard deviation over 10 runs. The underlying ground truth consists of an FSA using 4 states. We can test the in-distribution validation accuracy to see how well a model can fit the data. Moreover, we test extrapolation to larger graphs to verify that the rules for underlying automata were successfully learned. Our Diff-FSA models generally perform well across all scenarios. Note that the recurrent GNN performs well but lacks the interpretation and visualization of the learned mechanics as discrete state automata.
| Model | Val Acc | n=10 | n=20 | n=50 | n=100 |
| --- | --- | --- | --- | --- | --- |
| GNCA | 0.38 $±$ 0.00 | 0.39 $±$ 0.00 | 0.40 $±$ 0.00 | 0.40 $±$ 0.00 | 0.39 $±$ 0.00 |
| Recurrent GNN | 1.00 $±$ 0.00 | 0.91 $±$ 0.10 | 0.85 $±$ 0.13 | 0.82 $±$ 0.16 | 0.81 $±$ 0.16 |
| Diff-FSA (4 states) | 0.99 $±$ 0.02 | 0.97 $±$ 0.02 | 0.96 $±$ 0.02 | 0.95 $±$ 0.03 | 0.94 $±$ 0.03 |
| Diff-FSA (5 states) | 1.00 $±$ 0.01 | 0.98 $±$ 0.01 | 0.96 $±$ 0.02 | 0.95 $±$ 0.02 | 0.95 $±$ 0.02 |
| Diff-FSA (6 states) | 0.99 $±$ 0.01 | 0.98 $±$ 0.01 | 0.95 $±$ 0.02 | 0.94 $±$ 0.02 | 0.94 $±$ 0.02 |
We use GRAB to create diverse datasets to benchmark the different models. These datasets encompass different graph types and sizes and generate a synthetic state automaton to generate the ground truth. This allows us to precisely control the setup by adjusting the number of states of the ground truth automaton. For each specific dataset configuration, we train 10 different model instances. Extended results with more runs can be found in the Appendix.
Table 1 shows the results for a dataset generated by GRAB. The dataset consists of tree graphs with a ground truth FSA consisting of 4 states. Generally, we can observe that both Diff-FSAinstances of the GraphFSA framework perform well, even if the number of states does not match exactly. The GNCA model performs poorly in this setting, as it struggles to learn the rules when they are applied for multiple steps. The recurrent GNN baseline performs well on in-distribution graphs, but struggles to generalize when the graph sizes are increased. Note that the recurrent GNN operates on continuous states, in contrast to the discrete states used by Diff-FSA. Moreover, the experiments were conducted with a varying number of states for Diff-FSA. While the ground-truth solution is modeled with 4 states, we want to investigate whether increasing this number for the learned automaton leads to different results. As we can observe, the accuracies are consistent for all sizes, indicating that more states can be used for training, which is especially useful when the underlying ground-truth automaton is not known beforehand.
4.4 Graph algorithms
Table 2: Evaluation of learning graph algorithms on the Distance dataset. We report the accuracy and standard deviation over 10 runs. All models are trained on graphs of size at most 10 and then tested for extrapolation on larger graph sizes $n$ . Our Diff-FSA outperforms other discrete transition-based models while matching the recurrent GNN baseline.
| GNCA | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 | 0.51 $±$ 0.00 | 0.50 $±$ 0.00 |
| --- | --- | --- | --- | --- |
| Recurrent GNN | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
| Diff-FSA | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
To showcase the potential of the GraphFSA framework for algorithm learning, we further evaluate it on more challenging graph algorithm problems. We use the same graph algorithms as Grötschla et al. [7], which, among other setups, includes a simplified distance computation and path finding. Detailed descriptions of these datasets can be found in the Appendix. Similar to the data generated by GRAB, our training process incorporates a random number of iterations, with the sole condition that the number of steps is sufficiently large to ensure complete information propagation across the graph. For problems involving small training graphs, we train the Diff-FSA model with approximately ten iterations and apply the additive final state loss to ensure iteration stability and establish distinct starting and final states.
The main aim is to validate that our proposed model is capable of learning more challenging algorithmic problems. It consistently performs better than GNCA. Compared to the recurrent GNN, it does not perform as well across all selected problems, but can learn correct solutions, as seen in the Distance task illustrated in Table 2. However, recall that Diff-FSAoperates on discrete states, making it much more challenging to learn. On the other hand, we can extract the learned solution and analyse the learned mechanics. A visualization of such a learned model is depicted in Figure 3 and Figure 5. This has the advantage over the recurrent baseline in that the learned mechanics can be interpreted, and the algorithmic behavior can be explained and verified.
5 Limitations and future work
The main advantage of the GraphFSA framework is its use of discrete states. This allows us to interpret the learned mechanics through the lens of a state automaton. Moreover, it can perform well in scenarios where the underlying rules of a problem can be modeled with discrete transitions while requiring that inputs and outputs can be represented as discrete states. This is the case for several algorithmic settings but limits the modelâs applicability to arbitrary graph learning tasks. To broaden the applicability of the GraphFSA framework, future work could investigate methods to map continuous input values to discrete states, potentially with a trainable module. Another aspect to investigate is the study of more finite aggregations for GraphFSA. These can heavily influence a modelâs expressivity or align its execution to a specific task. Moreover, Diff-FSA only represents one possible approach to train models within the GraphFSA framework. Training models that yield discrete transitions remains challenging and could further improve performance and effectiveness.
6 Conclusion
We present GraphFSA, an execution framework that extends finite state automata by leveraging discrete state spaces on graphs. Our research demonstrates the potential of GraphFSA for algorithm learning on graphs due to its ability to simulate discrete decision-making processes. Additionally, we introduce GRAB, a benchmark designed to test and compare methods compatible with the GraphFSA framework across a variety of graph distributions and sizes. Our evaluation shows that Diff-FSAcan effectively learn rules for classical cellular automata problems, such as the Game of Life, producing structured and interpretable representations in the form of finite state automata. While this approach is intentionally restrictive, it simplifies complexity and aligns the model with the task at hand. We further validate our methodology on a range of synthetic state automaton problems and complex algorithmic datasets. Notably, the discrete state space within GraphFSA enables Diff-FSAto exhibit robust generalization capabilities.
References
- de Balle Pigem [2013] B. de Balle Pigem. Learning finite-state machines: statistical and algorithmic aspects. TDX (Tesis Doctorals en Xarxa), 7 2013. URL http://www.tesisenred.net/handle/10803/129070.
- Dewdney [1990] A. K. Dewdney. Computer recreations. Scientific American, 262(1):146â149, 1990. ISSN 00368733, 19467087. URL http://www.jstor.org/stable/24996654.
- Emek and Wattenhofer [2013] Y. Emek and R. Wattenhofer. Stone age distributed computing. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing, PODC â13, page 137â146, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450320658. 10.1145/2484239.2484244. URL https://doi.org/10.1145/2484239.2484244.
- Gardner [1970] M. Gardner. Mathematical games. Scientific American, 223(4):120â123, 1970. ISSN 00368733, 19467087. URL http://www.jstor.org/stable/24927642.
- Gers and Schmidhuber [2001] F. Gers and E. Schmidhuber. Lstm recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333â1340, 2001. 10.1109/72.963769.
- Grattarola et al. [2024] D. Grattarola, L. Livi, and C. Alippi. Learning graph cellular automata. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS â21, Red Hook, NY, USA, 2024. Curran Associates Inc. ISBN 9781713845393.
- Grötschla et al. [2022] F. Grötschla, J. Mathys, and R. Wattenhofer. Learning graph algorithms with recurrent graph neural networks. 2022. URL https://arxiv.org/abs/2212.04934.
- Hohman et al. [2019] F. Hohman, M. Kahng, R. Pienta, and D. H. Chau. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics, 25(8):2674â2693, 2019. 10.1109/TVCG.2018.2843369.
- Huang et al. [2023] Q. Huang, M. Yamada, Y. Tian, D. Singh, and Y. Chang. Graphlime: Local interpretable model explanations for graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 35(7):6968â6972, 2023. 10.1109/TKDE.2022.3187455.
- Ibarz et al. [2022] B. Ibarz, V. Kurin, G. Papamakarios, K. Nikiforou, M. Bennani, R. CsordĂĄs, A. J. Dudzik, M. BoĆĄnjak, A. Vitvitskyi, Y. Rubanova, A. Deac, B. Bevilacqua, Y. Ganin, C. Blundell, and P. VeliÄkoviÄ. A generalist neural algorithmic learner. In B. Rieck and R. Pascanu, editors, Proceedings of the First Learning on Graphs Conference, volume 198 of Proceedings of Machine Learning Research, pages 2:1â2:23. PMLR, 09â12 Dec 2022. URL https://proceedings.mlr.press/v198/ibarz22a.html.
- Johnson et al. [2020a] D. Johnson, H. Larochelle, and D. Tarlow. Learning graph structure with a finite-state automaton layer. Advances in Neural Information Processing Systems, 33:3082â3093, 2020a.
- Johnson et al. [2020b] D. D. Johnson, H. Larochelle, and D. Tarlow. Learning graph structure with a finite-state automaton layer. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS â20, Red Hook, NY, USA, 2020b. Curran Associates Inc. ISBN 9781713829546.
- Joshi et al. [2021] C. K. Joshi, Q. Cappart, L.-M. Rousseau, and T. Laurent. Learning TSP Requires Rethinking Generalization. In L. D. Michel, editor, 27th International Conference on Principles and Practice of Constraint Programming (CP 2021), volume 210 of Leibniz International Proceedings in Informatics (LIPIcs), pages 33:1â33:21, Dagstuhl, Germany, 2021. Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik. ISBN 978-3-95977-211-2. 10.4230/LIPIcs.CP.2021.33. URL https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CP.2021.33.
- Kipf and Welling [2017] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
- Loukas [2020] A. Loukas. What graph neural networks cannot learn: depth vs width. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=B1l2bp4YwS.
- Lucchesi and Kowaltowski [1993] C. L. Lucchesi and T. Kowaltowski. Applications of finite automata representing large vocabularies. Softw. Pract. Exper., 23(1):15â30, jan 1993. ISSN 0038-0644. 10.1002/spe.4380230103. URL https://doi.org/10.1002/spe.4380230103.
- Marr and HĂŒtt [2009] C. Marr and M.-T. HĂŒtt. Outer-totalistic cellular automata on graphs. Physics Letters A, 373(5):546â549, jan 2009. 10.1016/j.physleta.2008.12.013. URL https://doi.org/10.1016%2Fj.physleta.2008.12.013.
- Mordvintsev [2022] A. Mordvintsev. Differentiable finite state machines, 2022. URL https://google-research.github.io/self-organising-systems/2022/diff-fsm/.
- Mordvintsev et al. [2020] A. Mordvintsev, E. Randazzo, E. Niklasson, and M. Levin. Growing neural cellular automata. Distill, 2020. URL https://distill.pub/2020/growing-ca/.
- MĂŒller et al. [2023] P. MĂŒller, L. Faber, K. Martinkus, and R. Wattenhofer. Graphchef: Learning the recipe of your dataset. In ICML 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH), 2023. URL https://openreview.net/forum?id=ZgYZH5PFEg.
- Papp and Wattenhofer [2022] P. A. Papp and R. Wattenhofer. A theoretical comparison of graph neural network extensions. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17323â17345. PMLR, 17â23 Jul 2022. URL https://proceedings.mlr.press/v162/papp22a.html.
- Sato et al. [2019] R. Sato, M. Yamada, and H. Kashima. Learning to sample hard instances for graph algorithms. In W. S. Lee and T. Suzuki, editors, Proceedings of The Eleventh Asian Conference on Machine Learning, volume 101 of Proceedings of Machine Learning Research, pages 503â518. PMLR, 17â19 Nov 2019. URL https://proceedings.mlr.press/v101/sato19a.html.
- Scarselli et al. [2009] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61â80, 2009. 10.1109/TNN.2008.2005605.
- Selsam et al. [2019] D. Selsam, M. Lamm, B. BĂŒnz, P. Liang, L. de Moura, and D. L. Dill. Learning a SAT solver from single-bit supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=HJMC\_iA5tm.
- Tang et al. [2020] H. Tang, Z. Huang, J. Gu, B.-L. Lu, and H. Su. Towards scale-invariant graph-related problem solving by iterative homogeneous gnns. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15811â15822. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/b64a70760bb75e3ecfd1ad86d8f10c88-Paper.pdf.
- VeliÄkoviÄ et al. [2022] P. VeliÄkoviÄ, A. P. Badia, D. Budden, R. Pascanu, A. Banino, M. Dashevskiy, R. Hadsell, and C. Blundell. The CLRS algorithmic reasoning benchmark. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 22084â22102. PMLR, 17â23 Jul 2022. URL https://proceedings.mlr.press/v162/velickovic22a.html.
- VeliÄkoviÄ and Blundell [2021] P. VeliÄkoviÄ and C. Blundell. Neural algorithmic reasoning. Patterns, 2(7):100273, 2021. ISSN 2666-3899. https://doi.org/10.1016/j.patter.2021.100273. URL https://www.sciencedirect.com/science/article/pii/S2666389921000994.
- VeliÄkoviÄ et al. [2018] P. VeliÄkoviÄ, G. Cucurull, A. Casanova, A. Romero, P. LiĂČ, and Y. Bengio. Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
- von Neumann and Burks [1967] J. von Neumann and A. W. Burks. Theory of self reproducing automata. 1967. URL https://api.semanticscholar.org/CorpusID:62696120.
- Weisfeiler and Leman [1968] B. Weisfeiler and A. Leman. The reduction of a graph to canonical form and the algebra which appears therein. nti, Series, 2(9):12â16, 1968.
- Wulff and Hertz [1992] N. H. Wulff and J. A. Hertz. Learning cellular automaton dynamics with neural networks. In Proceedings of the 5th International Conference on Neural Information Processing Systems, NIPSâ92, page 631â638, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc. ISBN 1558602747.
- Xu et al. [2019] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
- Ying et al. [2019] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. Gnnexplainer: Generating explanations for graph neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/d80b7040b773199015de6d3b4293c8ff-Paper.pdf.
Appendix A Extended dataset descriptions
A.1 Cellular Automata
As a first interesting step towards learning algorithms, we focus on cellular automata probelms. We consider a variety of different settings such as 1D and 2D automata, these include more well known instances such as Wireworld or Game of Life, which despite its simple behaviour is known to be Turing complete.
1D-Cellular Automata
The 1D-Cellular Automata dataset consists of one-dimensional cellular automata systems, each defined by one of the possible 256 rules. Each rule corresponds to a distinct mapping of a cellâs state and its two neighbors to a new state. Figure 7 provides a graphical illustration of one such automata. Note, that we need todistinguish between the left and right neighbors in order to capture all rules in GraphFSA.
<details>
<summary>x8.png Details</summary>

### Visual Description
Icon/Small Image (595x58)
</details>
Figure 7: Visualization of 1D CA rule 30 - top row shows different combinations (left neighbor state, current cell state, right neighbor state), bottom row center shows new state value for the cell
Game Of Life
The GameOfLife dataset captures the essence of Conwayâs Game of Life that progresses based on its initial state and a set of simple rules.
The progression of Conwayâs Game of Life is dictated by a set of simple rules applied to each cell in the grid, considering its neighbors. Using the Moore neighborhood, which includes all eight surrounding cells, these rules are as follows:
1. Birth: A cell that is dead will become alive if it has exactly three living neighbors.
1. Survival: A cell that is alive will remain alive if it has two or three living neighbors.
1. Death:
1. Loneliness: A cell that is alive and has fewer than two living neighbors will die, mimicking underpopulation.
1. Overpopulation: A cell that is alive and has more than three living neighbors will die, representing overpopulation.
Toroidal vs. non-toroidal
For this dataset, we consider both toroidal and non-toroidal variations:
1. Toroidal: In the toroidal variation, the boardâs edges wrap around, creating a continuous, closed surface. This means cells on the edge have neighbors on the opposite edge.
1. Non-Toroidal: In the standard, non-toroidal variation, cells on the boardâs edge only consider neighbors within the boundary.
Our dataset consists of input/output pairs where we randomly initialize the grid and then apply the Game of Life rules for a fixed number of steps. We represent this dataset through grid graphs and use the Moore neighborhood.
Hexagonal Game Of Life
The Hexagonal Game Of Life introduces a variation where cells are hexagonal as opposed to the traditional square grid. This change in cell structure offers a fresh set of neighbour cells, which can lead to distinct patterns and evolutions. A visual representation can be found in Figure 4.
WireWorld
The WireWorld dataset revolves around the cellular automaton âWireWorldâ where cells can take one of four states: empty, electron head, electron tail, and conductor. Itâs especially renowned for its capability to implement digital circuitry. In the dataset, we observe the evolution of a given cellular configuration over specified iterations.
A.2 Graph algorithms
Taking inspiration from the datasets utilized by Grötschla et al. [7], we create datasets for various classical graph problems. To ensure versatility and scalability, we generate new graphs for each problem instance and compute the corresponding ground truth during dataset creation. This approach enables us to construct datasets that not only encompass graphs of specific sizes but also facilitate evaluation on larger extrapolation datasets. We explore the following graph problems that helps us to explore different capabilities of our model.
A.2.1 Distance
The distance problem involves determining whether each node in the graph has an even or odd distance from the root node. To formulate this problem, we define input values for each problem instance, representing each nodeâs state in the graph. Among these nodes, one is designated as the root, while the others are marked as non-root inputs. The output is assigned a binary value (0 or 1) for each node based on $\text{distance}\mod 2$ from the root, where âdistanceâ represents the length of the shortest path between the root and a node.
A.2.2 RootValue
In the root value problem, we want to propagate a value from the root throughout a path graph. One node in the graph is assigned a root label and a binary value (0,1). The objective is to propagate this binary value from the root across the entire graph.
A.2.3 PathFinding
The PathFinding problem determines whether a given node lies on the path between two marked nodes within a tree. The dataset comprises different trees, and two nodes are explicitly marked as relevant nodes within each tree. The objective is to predict whether a specific node in the tree, which is not one of the labeled nodes, lies on the path connecting these two marked nodes.
A.2.4 PrefixSum
The PrefixSum Dataset involves paths represented as sequences of nodes where each has an initial binary feature (either 1 or 0). The task is to predict the sum of all initial features to the right of each node in the path, considering modulo two arithmetic.
Appendix B Additional model evaluation
B.1 1D cellular automata
We evaluate the models on 1D Cellular Automata evaluation for the different baselines and consider a larger graph during and multiple timesteps $t$ for our evaluation. We report the results in Table 3 for models that were trained on $t=1$ and in Table 4 for models trained on $t=2$ .
Table 3: 1D Cellular Automata evaluation for the different baselines, we consider a path of size $10$ for the extrapolation data, and $t$ indicates the number of times the CA rule has been applied. We report accuracy and all models were trained for $t=1$ .
| GNCA | 0.75 $±$ 0.00 | 0.66 $±$ 0.05 | 0.71 $±$ 0.08 | 0.79 $±$ 0.17 | 0.78 $±$ 0.17 | 0.77 $±$ 0.18 | 0.77 $±$ 0.18 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Recurrent GNN | 0.75 $±$ 0.00 | 0.50 $±$ 0.08 | 0.47 $±$ 0.17 | 0.47 $±$ 0.29 | 0.47 $±$ 0.30 | 0.47 $±$ 0.30 | 0.47 $±$ 0.30 |
| Diff-FSA | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
Table 4: 1D Cellular Automata evaluation for the different baselines, we consider a path of size $10$ for the extrapolation data, and $t$ indicates the number of times the CA rule has been applied. We report accuracy and all models were trained for $t=2$ .
| GNCA | 0.49 $±$ 0.11 | 0.73 $±$ 0.03 | 0.70 $±$ 0.13 | 0.86 $±$ 0.16 | 0.86 $±$ 0.16 | 0.86 $±$ 0.16 | 0.86 $±$ 0.16 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Recurrent GNN | 0.64 $±$ 0.15 | 0.79 $±$ 0.00 | 0.58 $±$ 0.13 | 0.66 $±$ 0.26 | 0.67 $±$ 0.29 | 0.67 $±$ 0.30 | 0.68 $±$ 0.30 |
| Diff-FSA | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
B.2 2D cellular automata
<details>
<summary>x9.png Details</summary>

### Visual Description
## Line Chart: Model Accuracy vs. Time
### Overview
The image is a line chart comparing the accuracy of three different models (Recurrent GNN, Diff-FSA, and GNCA) over time. The x-axis represents time ('t'), and the y-axis represents accuracy.
### Components/Axes
* **X-axis:** 't' (Time). Scale ranges from 0 to 100, with tick marks at 0, 20, 40, 60, 80, and 100.
* **Y-axis:** 'Accuracy'. Scale ranges from 0.5 to 1.0, with tick marks at 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0.
* **Legend (bottom-right):**
* Recurrent GNN (Blue line with shaded area)
* Diff-FSA (Orange line)
* GNCA (Green line with shaded area)
### Detailed Analysis
* **Recurrent GNN (Blue):**
* Trend: Initially decreases, then steadily increases over time.
* Data Points:
* t=0, Accuracy â 0.75
* t=5, Accuracy â 0.72
* t=10, Accuracy â 0.78
* t=20, Accuracy â 0.82
* t=40, Accuracy â 0.88
* t=60, Accuracy â 0.88
* t=80, Accuracy â 0.92
* t=100, Accuracy â 0.93
* **Diff-FSA (Orange):**
* Trend: Constant accuracy over time.
* Data Points:
* Accuracy â 1.0 for all values of t.
* **GNCA (Green):**
* Trend: Oscillates between a high and low accuracy value, forming a repeating triangular pattern.
* Data Points:
* t=0, Accuracy â 0.55
* t=5, Accuracy â 1.0
* t=15, Accuracy â 0.82
* t=25, Accuracy â 1.0
* t=35, Accuracy â 0.85
* t=45, Accuracy â 1.0
* t=55, Accuracy â 0.85
* t=65, Accuracy â 1.0
* t=75, Accuracy â 0.88
* t=85, Accuracy â 1.0
* t=95, Accuracy â 0.92
### Key Observations
* Diff-FSA consistently achieves perfect accuracy (1.0) throughout the entire time range.
* Recurrent GNN shows a gradual improvement in accuracy over time, starting lower than Diff-FSA but approaching it.
* GNCA exhibits a fluctuating accuracy, with periodic peaks and troughs.
### Interpretation
The chart compares the performance of three machine learning models in terms of accuracy over time. Diff-FSA demonstrates superior and stable performance, achieving perfect accuracy. Recurrent GNN shows a learning curve, gradually improving its accuracy. GNCA's oscillating accuracy suggests instability or sensitivity to the time variable. The shaded regions around the Recurrent GNN and GNCA lines likely represent the variance or confidence intervals of the accuracy measurements. The data suggests that Diff-FSA is the most reliable model for this particular task, while Recurrent GNN requires more training time to reach optimal performance, and GNCA's performance is inconsistent.
</details>
Figure 8: Mean accuracies for Game of Life on a regular grid when learned with 2 steps and applied for $t$ steps during inference (for 10 seeds each). All models report high accuracy for the training setup. However, performance deteriorates during inference when more steps are executed except for Diff-FSA.
<details>
<summary>x10.png Details</summary>

### Visual Description
## Line Chart: Model Accuracy vs. Time
### Overview
The image is a line chart comparing the accuracy of three different models (Recurrent GNN, Diff-FSA, and GNCA) over time. The x-axis represents time ('t'), ranging from 0 to 100. The y-axis represents accuracy, ranging from 0.5 to 1.0. The chart displays the performance of each model as a function of time, with shaded regions around the GNCA and Recurrent GNN lines indicating variability or confidence intervals.
### Components/Axes
* **X-axis:**
* Label: "t"
* Scale: 0 to 100, with major ticks at 0, 20, 40, 60, 80, and 100.
* **Y-axis:**
* Label: "Accuracy"
* Scale: 0.5 to 1.0, with major ticks at 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0.
* **Legend:** Located in the bottom-right corner.
* Title: "Model"
* Entries:
* Recurrent GNN (Blue line)
* Diff-FSA (Orange line)
* GNCA (Green line)
### Detailed Analysis
* **Recurrent GNN (Blue):**
* Trend: The accuracy starts at approximately 0.73 at t=0, then increases rapidly to about 0.82 by t=10. It continues to increase gradually, reaching approximately 0.88 by t=40, 0.91 by t=60, and 0.93 by t=100. There is a light blue shaded region around the line, indicating the variance in the accuracy.
* Data Points:
* t=0, Accuracy â 0.73
* t=10, Accuracy â 0.82
* t=40, Accuracy â 0.88
* t=60, Accuracy â 0.91
* t=100, Accuracy â 0.93
* **Diff-FSA (Orange):**
* Trend: The accuracy remains constant at 1.0 across the entire time range.
* Data Points:
* Accuracy = 1.0 for all t.
* **GNCA (Green):**
* Trend: The accuracy fluctuates significantly. It starts at approximately 0.5 at t=0, rapidly increases to 1.0, and then oscillates between approximately 0.8 and 1.0. There is a light green shaded region around the line, indicating the variance in the accuracy.
* Data Points (Peaks and Troughs):
* t=0, Accuracy â 0.5
* t=~3, Accuracy â 1.0
* t=~10, Accuracy â 0.8
* t=~15, Accuracy â 1.0
* t=~25, Accuracy â 0.8
* t=~30, Accuracy â 1.0
* t=~40, Accuracy â 0.8
* t=~45, Accuracy â 1.0
* t=~55, Accuracy â 0.8
* t=~60, Accuracy â 1.0
* t=~70, Accuracy â 0.8
* t=~75, Accuracy â 1.0
* t=~85, Accuracy â 0.8
* t=~90, Accuracy â 1.0
* t=~95, Accuracy â 0.9
### Key Observations
* Diff-FSA consistently achieves perfect accuracy (1.0) throughout the entire time range.
* Recurrent GNN shows a gradual improvement in accuracy over time, but never reaches the perfect accuracy of Diff-FSA.
* GNCA exhibits high initial variability and oscillates between 0.8 and 1.0, indicating instability or sensitivity to time.
### Interpretation
The chart demonstrates the performance of three different models over time. Diff-FSA is the most consistent and accurate model, achieving perfect accuracy throughout the entire time range. Recurrent GNN shows a gradual improvement in accuracy, suggesting it learns over time, but it does not reach the same level of accuracy as Diff-FSA. GNCA's oscillating accuracy suggests that it may be sensitive to the specific time step or that it requires further stabilization or tuning. The shaded regions around the Recurrent GNN and GNCA lines indicate the variability or confidence intervals in their accuracy, with GNCA showing a much wider range of variability. The data suggests that Diff-FSA is the superior model for this particular task, as it consistently achieves perfect accuracy.
</details>
Figure 9: Mean accuracies for Game of Life on a hexagonal grid when learned with 2 steps and applied for $t$ steps during inference (for 10 seeds each). All models report high accuracy for the training setup. However, performance deteriorates during inference when more steps are executed except for Diff-FSA.
Game of Life results.
Specifically, we use a state space of two for Game of Life and a bounding parameter of five. We let all models execute 2 steps in order to learn the appropriate transition rules after two iterations for Game of Life. Further, we investigate the generalization capabilities and especially the iteration stability of the learned models. For this, we consider all models which achieve perfect training accuracy and let them run on larger 10 node paths for more timesteps t than during training. The results are depicted in Figure 8 for the regular Game of Life and in Figure 9 for the hexagonal variant. Note, that the recurrent GNN deteriorates outside the training distribution. Similarly, the GNCA baseline struggles to generalize as it has not learned the correct intermediate transitions for odd number of rules. The Diff-FSA on the other hand exhibits good iteration stability across the whole range of iterations.
B.3 Dataset generator
We perform further evaluations on more datasets generated by our dataset generator GRAB. The groudturth automaton consists of 4 states and we test multiple Diff-FSAwith 4, 5, and 6 states as well as the baselines over 10 runs. We report the achieved accuracies for the different experiments in Table 5.
Table 5: Evaluation of GraphFSA on synthetic data provided by GRAB. We report the accuracy and standard deviation over 10 runs. The underlying ground truth consists of an FSA using 4 states. We can test the in-distribution validation accuracy to see how well a model can fit the data. Moreover, we test extrapolation to larger graphs to verify that the rules for underlying automata were successfully learned. Our Diff-FSA models generally perform well across all scenarios. Note that the recurrent GNN performs well but lacks the interpretation and visualization of the learned mechanics as discrete state automata.
| Model | Val Acc | n=10 | n=20 | n=50 | n=100 |
| --- | --- | --- | --- | --- | --- |
| Experiment 1 | | | | | |
| GNCA | 0.38 $±$ 0.00 | 0.39 $±$ 0.00 | 0.40 $±$ 0.00 | 0.40 $±$ 0.00 | 0.39 $±$ 0.00 |
| Recurrent GNN | 1.00 $±$ 0.00 | 0.91 $±$ 0.10 | 0.85 $±$ 0.13 | 0.82 $±$ 0.16 | 0.81 $±$ 0.16 |
| Diff-FSA (4 states) | 0.99 $±$ 0.02 | 0.97 $±$ 0.02 | 0.96 $±$ 0.02 | 0.95 $±$ 0.03 | 0.94 $±$ 0.03 |
| Diff-FSA (5 states) | 1.00 $±$ 0.01 | 0.98 $±$ 0.01 | 0.96 $±$ 0.02 | 0.95 $±$ 0.02 | 0.95 $±$ 0.02 |
| Diff-FSA (6 states) | 0.99 $±$ 0.01 | 0.98 $±$ 0.01 | 0.95 $±$ 0.02 | 0.94 $±$ 0.02 | 0.94 $±$ 0.02 |
| Experiment 2 | | | | | |
| GNCA | 0.66 $±$ 0.00 | 0.67 $±$ 0.00 | 0.66 $±$ 0.00 | 0.65 $±$ 0.00 | 0.65 $±$ 0.00 |
| Recurrent GNN | 1.00 $±$ 0.00 | 0.84 $±$ 0.02 | 0.82 $±$ 0.02 | 0.81 $±$ 0.02 | 0.80 $±$ 0.02 |
| Diff-FSA (4 states) | 1.00 $±$ 0.00 | 0.94 $±$ 0.01 | 0.93 $±$ 0.02 | 0.92 $±$ 0.02 | 0.91 $±$ 0.02 |
| Diff-FSA (5 states) | 1.00 $±$ 0.00 | 0.93 $±$ 0.02 | 0.92 $±$ 0.02 | 0.91 $±$ 0.02 | 0.90 $±$ 0.02 |
| Diff-FSA (6 states) | 1.00 $±$ 0.00 | 0.94 $±$ 0.02 | 0.93 $±$ 0.02 | 0.92 $±$ 0.02 | 0.91 $±$ 0.02 |
| Experiment 3 | | | | | |
| GNCA | 0.92 $±$ 0.00 | 0.95 $±$ 0.00 | 0.93 $±$ 0.00 | 0.93 $±$ 0.00 | 0.93 $±$ 0.00 |
| Recurrent GNN | 0.98 $±$ 0.03 | 0.98 $±$ 0.02 | 0.96 $±$ 0.03 | 0.95 $±$ 0.03 | 0.95 $±$ 0.03 |
| Diff-FSA (4 states) | 0.92 $±$ 0.00 | 0.95 $±$ 0.00 | 0.93 $±$ 0.00 | 0.93 $±$ 0.00 | 0.93 $±$ 0.00 |
| Diff-FSA (5 states) | 0.91 $±$ 0.02 | 0.94 $±$ 0.03 | 0.92 $±$ 0.01 | 0.93 $±$ 0.01 | 0.93 $±$ 0.01 |
| Diff-FSA (6 states) | 0.91 $±$ 0.02 | 0.94 $±$ 0.03 | 0.92 $±$ 0.01 | 0.93 $±$ 0.01 | 0.93 $±$ 0.01 |
| Experiment 4 | | | | | |
| GNCA | 0.56 $±$ 0.00 | 0.64 $±$ 0.00 | 0.63 $±$ 0.00 | 0.64 $±$ 0.00 | 0.63 $±$ 0.00 |
| Recurrent GNN | 1.00 $±$ 0.00 | 0.98 $±$ 0.05 | 0.97 $±$ 0.05 | 0.97 $±$ 0.05 | 0.97 $±$ 0.05 |
| Diff-FSA (4 states) | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
| Diff-FSA (5 states) | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
| Diff-FSA (6 states) | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 |
| Experiment 5 | | | | | |
| GNCA | 0.54 $±$ 0.00 | 0.54 $±$ 0.00 | 0.55 $±$ 0.00 | 0.54 $±$ 0.00 | 0.55 $±$ 0.00 |
| Recurrent GNN | 1.00 $±$ 0.00 | 0.97 $±$ 0.03 | 0.94 $±$ 0.03 | 0.93 $±$ 0.03 | 0.93 $±$ 0.04 |
| Diff-FSA (4 states) | 0.98 $±$ 0.00 | 0.96 $±$ 0.00 | 0.95 $±$ 0.00 | 0.93 $±$ 0.00 | 0.93 $±$ 0.00 |
| Diff-FSA (5 states) | 0.98 $±$ 0.00 | 0.97 $±$ 0.00 | 0.95 $±$ 0.00 | 0.94 $±$ 0.00 | 0.94 $±$ 0.00 |
| Diff-FSA (6 states) | 0.98 $±$ 0.00 | 0.96 $±$ 0.01 | 0.95 $±$ 0.01 | 0.94 $±$ 0.01 | 0.94 $±$ 0.00 |
B.4 Algorithms
We evaluate the GraphFSA framework and in particular our Diff-FSAand the baselines on more challenging algorithmic datasets. We evaluate on the four datasets Distance, PrefixSum, PathFinding and RootValue. The report the results in the tables: 6, 8 and 7.
Table 6: Evaluation of learning graph algorithms on the RootValue dataset. We report the accuracy and standard deviation over 10 runs. All models are trained on graphs of size at most 10 and then tested for extrapolation on larger graph sizes $n$ .
| GNCA | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 |
| --- | --- | --- | --- | --- |
| Recurrent GNN | 1.00 $±$ 0.00 | 1.00 $±$ 0.00 | 0.92 $±$ 0.16 | 0.91 $±$ 0.16 |
| Diff-FSA | 1.00 $±$ 0.02 | 0.99 $±$ 0.02 | 0.99 $±$ 0.03 | 0.99 $±$ 0.04 |
Table 7: Evaluation of learning graph algorithms on the PathFinding dataset. We report the accuracy and standard deviation over 10 runs. All models are trained on graphs of size at most 10 and then tested for extrapolation on larger graph sizes $n$ .
| GNCA | 0.58 $±$ 0.00 | 0.73 $±$ 0.00 | 0.83 $±$ 0.00 | 0.89 $±$ 0.00 |
| --- | --- | --- | --- | --- |
| Recurrent GNN | 0.97 $±$ 0.02 | 0.90 $±$ 0.03 | 0.88 $±$ 0.04 | 0.89 $±$ 0.09 |
| Diff-FSA | 0.85 $±$ 0.00 | 0.85 $±$ 0.00 | 0.87 $±$ 0.00 | 0.91 $±$ 0.00 |
Table 8: Evaluation of learning graph algorithms on the PrefixSum dataset. We report the accuracy and standard deviation over 10 runs. All models are trained on graphs of size at most 10 and then tested for extrapolation on larger graph sizes $n$ .
| GNCA | 0.49 $±$ 0.00 | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 | 0.50 $±$ 0.00 |
| --- | --- | --- | --- | --- |
| Recurrent GNN | 0.97 $±$ 0.04 | 0.95 $±$ 0.06 | 0.90 $±$ 0.13 | 0.84 $±$ 0.14 |
| Diff-FSA | 0.74 $±$ 0.14 | 0.67 $±$ 0.17 | 0.63 $±$ 0.20 | 0.61 $±$ 0.20 |