# PUZZLES: A Benchmark for Neural Algorithmic Reasoning
**Authors**: ETH Zürich
Abstract
Algorithmic reasoning is a fundamental cognitive ability that plays a pivotal role in problem-solving and decision-making processes. Reinforcement Learning (RL) has demonstrated remarkable proficiency in tasks such as motor control, handling perceptual input, and managing stochastic environments. These advancements have been enabled in part by the availability of benchmarks. In this work we introduce PUZZLES, a benchmark based on Simon Tatham’s Portable Puzzle Collection, aimed at fostering progress in algorithmic and logical reasoning in RL. PUZZLES contains 40 diverse logic puzzles of adjustable sizes and varying levels of complexity; many puzzles also feature a diverse set of additional configuration parameters. The 40 puzzles provide detailed information on the strengths and generalization capabilities of RL agents. Furthermore, we evaluate various RL algorithms on PUZZLES, providing baseline comparisons and demonstrating the potential for future research. All the software, including the environment, is available at https://github.com/ETH-DISCO/rlp.
Human intelligence relies heavily on logical and algorithmic reasoning as integral components for solving complex tasks. While Machine Learning (ML) has achieved remarkable success in addressing many real-world challenges, logical and algorithmic reasoning remains an open research question [1, 2, 3, 4, 5, 6, 7]. This research question is supported by the availability of benchmarks, which allow for a standardized and broad evaluation framework to measure and encourage progress [8, 9, 10].
Reinforcement Learning (RL) has made remarkable progress in various domains, showcasing its capabilities in tasks such as game playing [11, 12, 13, 14, 15] , robotics [16, 17, 18, 19] and control systems [20, 21, 22]. Various benchmarks have been proposed to enable progress in these areas [23, 24, 25, 26, 27, 28, 29]. More recently, advances have also been made in the direction of logical and algorithmic reasoning within RL [30, 31, 32]. Popular examples also include the games of Chess, Shogi, and Go [33, 34]. Given the importance of logical and algorithmic reasoning, we propose a benchmark to guide future developments in RL and more broadly machine learning.
Logic puzzles have long been a playful challenge for humans, and they are an ideal testing ground for evaluating the algorithmic and logical reasoning capabilities of RL agents. A diverse range of puzzles, similar to the Atari benchmark [24], favors methods that are broadly applicable. Unlike tasks with a fixed input size, logic puzzles can be solved iteratively once an algorithmic solution is found. This allows us to measure how well a solution attempt can adapt and generalize to larger inputs. Furthermore, in contrast to games such as Chess and Go, logic puzzles have a known solution, making reward design easier and enabling tracking progress and guidance with intermediate rewards.
<details>
<summary>x1.png Details</summary>

### Visual Description
\n
## Grid of Puzzle Game Visualizations
### Overview
The image presents a grid of 24 distinct puzzle game visualizations. Each visualization appears to represent the initial state or a key element of a specific puzzle. The puzzles are arranged in a 6x4 grid. The image does not contain numerical data in the traditional sense, but rather visual representations of puzzle states.
### Components/Axes
The image is organized as a grid. There are no explicit axes or legends in the traditional chart sense. Each cell in the grid contains a unique puzzle visualization. The puzzles are labeled with names below each image: Black Box, Bridges, Cube, Dominoes, Fifteen, Filling, Flip, Flood, Galaxies, Guess, Inertia, Keen, Lightup, Loopy, Magnets, Map, Mosaic, Net, Netslide, Palisade, Pattern, Pegs, Range, Rectangles, Same Game, Signpost, Singles, Sixteen, Slant, Solo, Tents, Towers, Tracks, Twiddle, Undead, Unequal, Untangle.
### Detailed Analysis or Content Details
Here's a description of each puzzle visualization, moving left to right, top to bottom. Note that these are descriptions of the visual elements, not solutions to the puzzles.
1. **Black Box:** A 3x3 grid with numbered cells (1-9) and lines connecting some of them.
2. **Bridges:** A grid of nodes with lines connecting them, representing bridges.
3. **Cube:** A 3D cube with colored faces.
4. **Dominoes:** A grid with numbered cells and dominoes placed on some of them. Numbers range from 1-6.
5. **Fifteen:** A 4x4 grid with numbered tiles (1-15) and one empty space.
6. **Filling:** A grid with cells of varying shades of gray.
7. **Flip:** A grid with cells colored black and white.
8. **Flood:** A grid with cells colored in various shades, suggesting a flooding pattern.
9. **Galaxies:** A grid of circles, some filled, some empty, representing galaxies.
10. **Guess:** A grid of colored circles. Colors include red, green, blue, yellow, and purple.
11. **Inertia:** A grid with colored shapes and lines.
12. **Keen:** A grid with numbered cells (0-15) and colored regions.
13. **Lightup:** A grid with numbered cells and bulbs.
14. **Loopy:** A grid with numbered cells and lines forming loops.
15. **Magnets:** A grid with numbered cells and plus/minus signs. Numbers range from -2 to 2.
16. **Map:** A grid with cells colored in shades of brown and green, resembling a map.
17. **Mosaic:** A grid with cells colored in various shades of blue and gray.
18. **Net:** A grid with cells colored black and white, forming a net-like pattern.
19. **Netslide:** A grid with colored cells and arrows.
20. **Palisade:** A grid with numbered cells (1-3) and lines forming palisades.
21. **Pattern:** A grid with numbered cells (0-4) and colored regions.
22. **Pegs:** A grid with pegs placed in some cells.
23. **Range:** A grid with numbered cells (1-13) and lines.
24. **Rectangles:** A grid with cells colored black and white, forming rectangles.
25. **Same Game:** A grid with colored squares. Colors include red, green, blue, and yellow.
26. **Signpost:** A grid with numbered cells and arrows. Equations are present: e=1, a+1, a+2, d+4, k=16.
27. **Singles:** A grid with numbered cells (1-6) and colored regions.
28. **Sixteen:** A 4x4 grid with numbered tiles (1-16).
29. **Slant:** A grid with numbered cells (1-2) and lines.
30. **Solo:** A grid with numbered cells (0-4) and colored regions.
31. **Tents:** A grid with numbered cells and tent-like structures.
32. **Towers:** A grid with numbered cells and tower-like structures.
33. **Tracks:** A grid with numbered cells (1-8) and lines forming tracks.
34. **Twiddle:** A grid with numbered cells (1-7) and lines.
35. **Undead:** A grid with numbered cells (1-9) and colored regions.
36. **Unequal:** A grid with numbered cells and colored regions.
37. **Untangle:** A grid with lines and nodes.
### Key Observations
The image showcases a diverse collection of puzzle types. The visual complexity varies significantly between puzzles. Some puzzles rely on numerical clues (e.g., Fifteen, Dominoes), while others are more visually oriented (e.g., Flood, Galaxies). The color palettes used are also diverse, with each puzzle employing a unique set of colors.
### Interpretation
The image serves as a catalog or overview of various logic and spatial reasoning puzzles. It doesn't present data in the traditional sense, but rather visual representations of puzzle states. The arrangement in a grid suggests a classification or categorization of puzzle types. The image is likely intended for someone interested in exploring different puzzle games or for a resource listing available puzzles. The variety of puzzles suggests a broad range of cognitive skills are engaged, from numerical reasoning to pattern recognition and spatial awareness. The inclusion of puzzle names provides a clear identification of each game. The image is a visual index, not a data analysis.
</details>
Figure 1: All puzzle classes of Simon Tatham’s Portable Puzzle Collection.
In this paper, we introduce PUZZLES, a comprehensive RL benchmark specifically designed to evaluate RL agents’ algorithmic reasoning and problem-solving abilities in the realm of logical and algorithmic reasoning. Simon Tatham’s Puzzle Collection [35], curated by the renowned computer programmer and puzzle enthusiast Simon Tatham, serves as the foundation of PUZZLES. This collection includes a set of 40 logic puzzles, shown in Figure 1, each of which presents distinct challenges with various dimensions of adjustable complexity. They range from more well-known puzzles, such as Solo or Mines (commonly known as Sudoku and Minesweeper, respectively) to lesser-known puzzles such as Cube or Slant. PUZZLES includes all 40 puzzles in a standardized environment, each playable with a visual or discrete input and a discrete action space.
Contributions.
We propose PUZZLES, an RL environment based on Simon Tatham’s Puzzle Collection, comprising a collection of 40 diverse logic puzzles. To ensure compatibility, we have extended the original C source code to adhere to the standards of the Pygame library. Subsequently, we have integrated PUZZLES into the Gymnasium framework API, providing a straightforward, standardized, and widely-used interface for RL applications. PUZZLES allows the user to arbitrarily scale the size and difficulty of logic puzzles, providing detailed information on the strengths and generalization capabilities of RL agents. Furthermore, we have evaluated various RL algorithms on PUZZLES, providing baseline comparisons and demonstrating the potential for future research.
1 Related Work
RL benchmarks.
Various benchmarks have been proposed in RL. Bellemare et al. [24] introduced the influential Atari-2600 benchmark, on which Mnih et al. [11] trained RL agents to play the games directly from pixel inputs. This benchmark demonstrated the potential of RL in complex, high-dimensional environments. PUZZLES allows the use of a similar approach where only pixel inputs are provided to the agent. Todorov et al. [23] presented MuJoCo which provides a diverse set of continuous control tasks based on a physics engine for robotic systems. Another control benchmark is the DeepMind Control Suite by Duan et al. [26], featuring continuous actions spaces and complex control problems. The work by Côté et al. [28] emphasized the importance of natural language understanding in RL and proposed a benchmark for evaluating RL methods in text-based domains. Lanctot et al. [29] introduced OpenSpiel, encompassing a wide range of games, enabling researchers to evaluate and compare RL algorithms’ performance in game-playing scenarios. These benchmarks and frameworks have contributed significantly to the development and evaluation of RL algorithms. OpenAI Gym by Brockman et al. [25], and its successor Gymnasium by the Farama Foundation [36] helped by providing a standardized interface for many benchmarks. As such, Gym and Gymnasium have played an important role in facilitating reproducibility and benchmarking in reinforcement learning research. Therefore, we provide PUZZLES as a Gymnasium environment to enable ease of use.
Logical and algorithmic reasoning within RL.
Notable research in RL on logical reasoning includes automated theorem proving using deep RL [16] or RL-based logic synthesis [37]. Dasgupta et al. [38] find that RL agents can perform a certain degree of causal reasoning in a meta-reinforcement learning setting. The work by Jiang and Luo [30] introduces Neural Logic RL, which improves interpretability and generalization of learned policies. Eppe et al. [39] provide steps to advance problem-solving as part of hierarchical RL. Fawzi et al. [31] and Mankowitz et al. [32] demonstrate that RL can be used to discover novel and more efficient algorithms for well-known problems such as matrix multiplication and sorting. Neural algorithmic reasoning has also been used as a method to improve low-data performance in classical RL control environments [40, 41]. Logical reasoning might be required to compete in certain types of games such as chess, shogi and Go [33, 34, 42, 13], Poker [43, 44, 45, 46] or board games [47, 48, 49, 50]. However, these are usually multi-agent games, with some also featuring imperfect information and stochasticity.
Reasoning benchmarks.
Various benchmarks have been introduced to assess different types of reasoning capabilities, although only in the realm of classical ML. IsarStep, proposed by Li et al. [8], specifically designed to evaluate high-level mathematical reasoning necessary for proof-writing tasks. Another significant benchmark in the field of reasoning is the CLRS Algorithmic Reasoning Benchmark, introduced by Veličković et al. [9]. This benchmark emphasizes the importance of algorithmic reasoning in machine learning research. It consists of 30 different types of algorithms sourced from the renowned textbook “Introduction to Algorithms” by Cormen et al. [51]. The CLRS benchmark serves as a means to evaluate models’ understanding and proficiency in learning various algorithms. In the domain of large language models (LLMs), BIG-bench has been introduced by Srivastava et al. [10]. BIG-bench incorporates tasks that assess the reasoning capabilities of LLMs, including logical reasoning.
Despite these valuable contributions, a suitable and unified benchmark for evaluating logical and algorithmic reasoning abilities in single-agent perfect-information RL has yet to be established. Recognizing this gap, we propose PUZZLES as a relevant and necessary benchmark with the potential to drive advancements and provide a standardized evaluation platform for RL methods that enable agents to acquire algorithmic and logical reasoning abilities.
2 The PUZZLES Environment
In the following section we give an overview of the PUZZLES environment. The puzzles are available to play online at https://www.chiark.greenend.org.uk/~sgtatham/puzzles/; excellent standalone apps for Android and iOS exist as well. The environment is written in both Python and C. For a detailed explanation of all features of the environment as well as their implementation, please see Appendices B and C.
Gymnasium RL Code
puzzle_env.py
puzzle.py
pygame.c
Puzzle C Sources
Pygame Library
puzzle Module
rlp Package Python C
Figure 2: Code and library landscape around the PUZZLES Environment, made up of the rlp Package and the puzzle Module . The figure shows how the puzzle Module presented in this paper fits within Tathams’s Puzzle Collection footnotemark: code, the Pygame package, and a user’s Gymnasium reinforcement learning code . The different parts are also categorized as Python language and C language.
2.1 Environment Overview
Within the PUZZLES environment, we encapsulate the tasks presented by each logic puzzle by defining consistent state, action, and observation spaces. It is also important to note that the large majority of the logic puzzles are designed so that they can be solved without requiring any guesswork. By default, we provide the option of two observation spaces, one is a representation of the discrete internal game state of the puzzle, the other is a visual representation of the game interface. These observation spaces can easily be wrapped in order to enable PUZZLES to be used with more advanced neural architectures such as graph neural networks (GNNs) or Transformers. All puzzles provide a discrete action space which only differs in cardinality. To accommodate the inherent difficulty and the need for proper algorithmic reasoning in solving these puzzles, the environment allows users to implement their own reward structures, facilitating the training of successful RL agents. All puzzles are played in a two-dimensional play area with deterministic state transitions, where a transition only occurs after a valid user input. Most of the puzzles in PUZZLES do not have an upper bound on the number of steps, they can only be completed by successfully solving the puzzle. An agent with a bad policy is likely never going to reach a terminal state. For this reason, we provide the option for early episode termination based on state repetitions. As we show in Section 3.4, this is an effective method to facilitate learning.
2.2 Difficulty Progression and Generalization
The PUZZLES environment places a strong emphasis on giving users control over the difficulty exhibited by the environment. For each puzzle, the problem size and difficulty can be adjusted individually. The difficulty affects the complexity of strategies that an agent needs to learn to solve a puzzle. As an example, Sudoku has tangible difficulty options: harder difficulties may require the use of new strategies such as forcing chains Forcing chains works by following linked cells to evaluate possible candidates, usually starting with a two-candidate cell. to find a solution, whereas easy difficulties only need the single position strategy. The single position strategy involves identifying cells which have only a single possible value.
The scalability of the puzzles in our environment offers a unique opportunity to design increasingly complex puzzle configurations, presenting a challenging landscape for RL agents to navigate. This dynamic nature of the benchmark serves two important purposes. Firstly, the scalability of the puzzles facilitates the evaluation of an agent’s generalization capabilities. In the PUZZLES environment, it is possible to train an agent in an easy puzzle setting and subsequently evaluate its performance in progressively harder puzzle configurations. For most puzzles, the cardinality of the action space is independent of puzzle size. It is therefore also possible to train an agent only on small instances of a puzzle and then evaluate it on larger sizes. This approach allows us to assess whether an agent has learned the correct underlying algorithm and generalizes to out-of-distribution scenarios. Secondly, it enables the benchmark to remain adaptable to the continuous advancements in RL methodologies. As RL algorithms evolve and become more capable, the puzzle configurations can be adjusted accordingly to maintain the desired level of difficulty. This ensures that the benchmark continues to effectively assess the capabilities of the latest RL methods.
3 Empirical Evaluation
We evaluate the baseline performance of numerous commonly used RL algorithms on our PUZZLES environment. Additionally, we also analyze the impact of certain design decisions of the environment and the training setup. Our metric of interest is the average number of steps required by a policy to successfully complete a puzzle, where lower is better. We refer to the term successful episode to denote the successful completion of a single puzzle instance. We also look at the success rate, i.e. what percentage of the puzzles was completed successfully.
To provide an understanding of the puzzle’s complexity and to contextualize the agents’ performance, we include an upper-bound estimate of the optimal number of steps required to solve the puzzle correctly. This estimate is a combination of both the steps required to solve the puzzle using an optimal strategy, and an upper bound on the environment steps required to achieve this solution, such as moving the cursor to the correct position. The upper bound is denoted as Optimal. Please refer to LABEL:tab:parameters for details on how this upper bound is calculated for each puzzle.
We run experiments based on all the RL algorithms presented in Table 8. We include both popular traditional algorithms such as PPO, as well as algorithms designed more specifically for the kinds of tasks presented in PUZZLES. Where possible, we used the implementations available in the RL library Stable Baselines 3 [52], using the default hyper-parameters. For MuZero and DreamerV3, we used the code available at [53] and [54], respectively. We provide a summary of all algorithms in Appendix Table 8. In total, our experiments required approximately 10’000 GPU hours.
All selected algorithms are compatible with the discrete action space required by our environment. This circumstance prohibits the use of certain other common RL algorithms such as Soft-Actor Critic (SAC) [55] or Twin Delayed Deep Deterministic Policy Gradients (TD3) [56].
3.1 Baseline Experiments
For the general baseline experiments, we trained all agents on all puzzles and evaluate their performance. Due to the challenging nature of our puzzles, we have selected an easy difficulty and small size of the puzzle where possible. Every agent was trained on the discrete internal state observation using five different random seeds. We trained all agents by providing rewards only at the end of each episode upon successful completion or failure. For computational reasons, we truncated all episodes during training and testing at 10,000 steps. For such a termination, reward was kept at 0. We evaluate the effect of this episode truncation in Section 3.4 We provide all experimental parameters, including the exact parameters supplied for each puzzle in Section E.3.
<details>
<summary>x2.png Details</summary>

### Visual Description
\n
## Bar Chart: Average Episode Length by Algorithm
### Overview
The image presents a bar chart comparing the average episode length achieved by different reinforcement learning algorithms. Each bar represents an algorithm, and the height of the bar indicates the average episode length. Error bars are included on top of each bar, representing the variability or standard deviation of the results.
### Components/Axes
* **X-axis:** Algorithm Name (A2C, DQN, DreamerV3, MuZero, PPO, QRDDQN, RecurrentPPO, TRPO, Optimal)
* **Y-axis:** Average Episode Length (Scale from 0 to 4000, increments of 1000)
* **Bars:** Represent the average episode length for each algorithm. The bars are light blue.
* **Error Bars:** Black vertical lines extending above and below each bar, indicating the variability of the results.
### Detailed Analysis
The chart displays the following approximate values (read from the bar heights and error bar endpoints):
* **A2C:** Average Episode Length ≈ 2600. Error bar extends from approximately 1700 to 3500.
* **DQN:** Average Episode Length ≈ 1800. Error bar extends from approximately 1000 to 2600.
* **DreamerV3:** Average Episode Length ≈ 1500. Error bar extends from approximately 800 to 2200.
* **MuZero:** Average Episode Length ≈ 1300. Error bar extends from approximately 600 to 2000.
* **PPO:** Average Episode Length ≈ 2800. Error bar extends from approximately 1800 to 3800.
* **QRDDQN:** Average Episode Length ≈ 2500. Error bar extends from approximately 1500 to 3500.
* **RecurrentPPO:** Average Episode Length ≈ 2300. Error bar extends from approximately 1300 to 3300.
* **TRPO:** Average Episode Length ≈ 1900. Error bar extends from approximately 1000 to 2800.
* **Optimal:** Average Episode Length ≈ 300. Error bar extends from approximately 0 to 600.
The bars for A2C, PPO, QRDDQN, and RecurrentPPO are relatively tall, indicating higher average episode lengths. DreamerV3 and MuZero have lower average episode lengths. The "Optimal" algorithm has a significantly lower average episode length than all other algorithms. The error bars show considerable variability in the results for all algorithms.
### Key Observations
* The "Optimal" algorithm achieves a significantly shorter average episode length compared to all other algorithms.
* A2C and PPO exhibit the highest average episode lengths.
* There is substantial variability in the results for each algorithm, as indicated by the large error bars.
* The algorithms cluster into two groups: those with average episode lengths around 1300-2000 and those with average episode lengths around 2300-2800.
### Interpretation
The chart suggests that the reinforcement learning algorithms tested differ significantly in their ability to sustain episodes. A longer episode length could indicate better performance in a given environment, but it depends on the specific task and reward structure. The "Optimal" algorithm's short episode length might indicate a rapid completion of the task or a different strategy altogether. The large error bars suggest that the performance of each algorithm is sensitive to factors such as random initialization, hyperparameter settings, or environmental variations. The clustering of algorithms into two groups suggests that there may be distinct approaches to solving the problem, with some algorithms consistently achieving longer episode lengths than others. Further investigation would be needed to understand the underlying reasons for these differences and to determine which algorithm is most suitable for a particular application. The chart provides a comparative overview of the algorithms' performance, but it does not reveal the specific mechanisms driving these results.
</details>
Figure 3: Average episode length of successful episodes for all evaluated algorithms on all puzzles in the easiest setting (lower is better). Some puzzles, namely Loopy, Pearl, Pegs, Solo, and Unruly, were intractable for all algorithms and were therefore excluded in this aggregation. The standard deviation is computed with respect to the performance over all evaluated instances for all trained seeds, aggregated for the total number of puzzles. Optimal refers the upper bound of the performance of an optimal policy, it therefore does not include a standard deviation. We see that DreamerV3 performs the best with an average episode length of 1334. However, this is still worse than the optimal upper bound at an average of 217 steps.
To track an agent’s progress, we use episode lengths, i.e., how many actions an agent needs to solve a puzzle. A lower number of actions indicates a stronger policy that is closer to the optimal solution. To obtain the final evaluation, we run each policy on 1000 random episodes of the respective puzzle, again with a maximum step size of 10,000 steps. All experiments were conducted on NVIDIA 3090 GPUs. The training time for a single agent with 2 million PPO steps varied depending on the puzzle and ranged from approximately 1.75 to 3 hours. The training for DreamerV3 and MuZero was more demanding and training time ranged from approximately 10 to 20 hours.
Figure 3 shows the average successful episode length for all algorithms. It can be seen that DreamerV3 performs best while PPO also achieves good performance, closely followed by TRPO and MuZero. This is especially interesting since PPO and TRPO follow much simpler training routines than DreamerV3 and MuZero. It seems that the implicit world models learned by DreamerV3 struggle to appropriately capture some puzzles. The high variance of MuZero may indicate some instability during training or the need for puzzle-specific hyperparamater tuning. Upon closer inspection of the detailed results, presented in Appendix Table 9 and 10, DreamerV3 manages to solve 62.7% of all puzzle instances. In 14 out of the 40 puzzles, it has found a policy that solves the puzzles within the Optimal upper bound. PPO and TRPO managed to solve an average of 61.6% and 70.8% of the puzzle instances, however only 8 and 11 of the puzzles have consistently solved within the Optimal upper bound. The algorithms A2C, RecurrentPPO, DQN and QRDQN perform worse than a pure random policy. Overall, it seems that some of the environments in PUZZLES are quite challenging and well suited to show the difference in performance between algorithms. It is also important to note that all the logic puzzles are designed so that they can be solved without requiring any guesswork.
3.2 Difficulty
We further evaluate the performance of a subset of the puzzles on the easiest preset difficulty level for humans. We selected all puzzles where a random policy was able to solve them with a probability of at least 10%, which are Netslide, Same Game and Untangle. By using this selection, we estimate that the reward density should be relatively high, ideally allowing the agent to learn a good policy. Again, we train all algorithms listed in Table 8. We provide results for the two strongest algorithms, PPO and DreamerV3 in Table 1, with complete results available in Appendix Table 9. Note that as part of Section 3.4, we also perform ablations using DreamerV3 on more puzzles on the easiest preset difficulty level for humans.
Table 1: Comparison of how many steps agents trained with PPO and DreamerV3 need on average to solve puzzles of two difficulty levels. In brackets, the percentage of successful episodes is reported. The difficulty levels correspond to the overall easiest and the easiest-for-humans settings. We also give the upper bound of optimal steps needed for each configuration.
| Netslide | 2x3b1 | $35.3± 0.7$ (100.0%) | $12.0± 0.4$ (100.0%) | 48 |
| --- | --- | --- | --- | --- |
| 3x3b1 | $4742.1± 2960.1$ (9.2%) | $3586.5± 676.9$ (22.4%) | 90 | |
| Same Game | 2x3c3s2 | $11.5± 0.1$ (100.0%) | $7.3± 0.2$ (100.0%) | 42 |
| 5x5c3s2 | $1009.3± 1089.4$ (30.5%) | $527.0± 162.0$ (30.2%) | 300 | |
| Untangle | 4 | $34.9± 10.8$ (100.0%) | $6.3± 0.4$ (100.0%) | 80 |
| 6 | $2294.7± 2121.2$ (96.2%) | $1683.3± 73.7$ (82.0%) | 150 | |
We can see that for both PPO and DreamerV3, the percentage of successful episodes decreases, with a large increase in steps required. DreamerV3 performs clearly stronger than PPO, requiring consistently fewer steps, but still more than the optimal policy. Our results indicate that puzzles with relatively high reward density at human difficulty levels remain challenging. We propose to use the easiest human difficulty level as a first measure to evaluate future algorithms. The details of the easiest human difficulty setting can be found in Appendix Table 7. If this level is achieved, difficulty can be further scaled up by increasing the size of the puzzles. Some puzzles also allow for an increase in difficulty with fixed size.
3.3 Effect of Action Masking and Observation Representation
We evaluate the effect of action masking, as well as observation type, on training performance. Firstly, we analyze whether action masking, as described in paragraph “Action Masking” in Section B.4, can positively affect training performance. Secondly, we want to see if agents are still capable of solving puzzles while relying on pixel observations. Pixel observations allow for the exact same input representation to be used for all puzzles, thus achieving a setting that is very similar to the Atari benchmark. We compare MaskablePPO to the default PPO without action masking on both types of observations. We summarize the results in Figure 4. Detailed results for masked RL agents on the pixel observations are provided in Appendix Table 11.
<details>
<summary>x3.png Details</summary>

### Visual Description
\n
## Bar Chart: Average Episode Length Comparison
### Overview
This image presents a bar chart comparing the average episode length for four different configurations: PPO (using Internal State), PPO (using RGB Pixels), MaskablePPO (using Internal State), and MaskablePPO (using RGB Pixels). Each bar also includes an error bar representing the variability in the data.
### Components/Axes
* **X-axis:** Represents the different configurations: "PPO (Internal State)", "PPO (RGB Pixels)", "MaskablePPO (Internal State)", "MaskablePPO (RGB Pixels)".
* **Y-axis:** Labeled "Average Episode Length", with a scale ranging from 0 to 2500, incrementing by 500.
* **Bars:** Represent the average episode length for each configuration.
* **Error Bars:** Black vertical lines extending above and below each bar, indicating the variability (likely standard deviation or standard error) around the mean.
### Detailed Analysis
The chart displays the following approximate values:
* **PPO (Internal State):** The bar reaches approximately 1650 on the Y-axis. The error bar extends from roughly 800 to 2400.
* **PPO (RGB Pixels):** The bar reaches approximately 1600 on the Y-axis. The error bar extends from roughly 800 to 2400.
* **MaskablePPO (Internal State):** The bar reaches approximately 800 on the Y-axis. The error bar extends from roughly 400 to 1200.
* **MaskablePPO (RGB Pixels):** The bar reaches approximately 1050 on the Y-axis. The error bar extends from roughly 400 to 1700.
### Key Observations
* PPO configurations (both Internal State and RGB Pixels) exhibit similar average episode lengths, which are significantly higher than those of MaskablePPO configurations.
* MaskablePPO (Internal State) has the lowest average episode length.
* The error bars are relatively large for all configurations, indicating substantial variability in the episode lengths.
* The error bars for PPO configurations overlap significantly, suggesting that the difference between using Internal State and RGB Pixels for PPO might not be statistically significant.
* The error bar for MaskablePPO (RGB Pixels) is larger than that of MaskablePPO (Internal State).
### Interpretation
The data suggests that using PPO results in longer average episode lengths compared to using MaskablePPO, regardless of whether the state is represented by Internal State or RGB Pixels. This could indicate that PPO is more effective at maintaining the agent's engagement in the environment for a longer duration. The large error bars suggest that there is considerable variation in the performance of each configuration, potentially due to the stochastic nature of the environment or the learning algorithm. The similarity in performance between PPO (Internal State) and PPO (RGB Pixels) suggests that the choice of state representation does not significantly impact the average episode length when using PPO. However, the difference in error bar size between MaskablePPO configurations could indicate that the RGB Pixel representation introduces more variability in the learning process. Further statistical analysis would be needed to confirm the significance of these observations.
</details>
<details>
<summary>x4.png Details</summary>

### Visual Description
\n
## Line Chart: Timesteps per Episode vs. Training Timesteps
### Overview
The image presents a line chart illustrating the relationship between training timesteps and the number of timesteps per episode for different reinforcement learning algorithms. The chart displays performance metrics over approximately 2 million training timesteps. The y-axis represents "Timesteps per Episode" on a logarithmic scale, while the x-axis represents "Training Timesteps". Multiple algorithms are compared, each represented by a different colored line.
### Components/Axes
* **X-axis:** "Training Timesteps" ranging from 0 to 2,000,000 (2 x 10<sup>6</sup>).
* **Y-axis:** "Timesteps per Episode" on a logarithmic scale, ranging from 10<sup>0</sup> to 10<sup>4</sup>.
* **Legend:** Located at the bottom-center of the chart, identifying the algorithms and their corresponding observation types:
* PPO (RGB Pixels) - Dark Red
* PPO (Internal State) - Orange
* MaskablePPO (RGB Pixels) - Blue
* MaskablePPO (Internal State) - Teal
* **Gridlines:** Present to aid in reading values.
### Detailed Analysis
The chart displays four distinct lines, each representing a different algorithm.
* **PPO (RGB Pixels) - Dark Red:** This line initially starts around 10<sup>2</sup> timesteps per episode and fluctuates between approximately 50 and 200 timesteps per episode for the majority of the training period. There are several spikes, reaching up to approximately 300 timesteps per episode around 0.25 x 10<sup>6</sup>, 0.75 x 10<sup>6</sup>, and 1.75 x 10<sup>6</sup> training timesteps.
* **PPO (Internal State) - Orange:** This line begins around 10<sup>2</sup> timesteps per episode and generally remains lower than the RGB Pixels version, fluctuating between approximately 20 and 100 timesteps per episode. It exhibits less volatility than the RGB Pixels version.
* **MaskablePPO (RGB Pixels) - Blue:** This line shows a dramatic increase in timesteps per episode. It starts around 10<sup>1</sup> timesteps per episode and rapidly increases to approximately 10<sup>3</sup> timesteps per episode around 0.5 x 10<sup>6</sup> training timesteps. It then plateaus around 10<sup>3</sup>-10<sup>4</sup> timesteps per episode for the remainder of the training period.
* **MaskablePPO (Internal State) - Teal:** This line remains consistently low, fluctuating between approximately 10 and 20 timesteps per episode throughout the entire training period.
### Key Observations
* **MaskablePPO (RGB Pixels)** demonstrates significantly longer episodes compared to the other algorithms, especially after 0.5 x 10<sup>6</sup> training timesteps.
* **PPO (RGB Pixels)** exhibits more variability in episode length than **PPO (Internal State)**.
* **MaskablePPO (Internal State)** consistently has the shortest episode lengths.
* The RGB Pixel versions of both PPO and MaskablePPO show more fluctuations than their Internal State counterparts.
### Interpretation
The data suggests that the MaskablePPO algorithm, when using RGB Pixels as observation input, is capable of learning to sustain episodes for a much longer duration than the other algorithms. This could indicate a greater ability to explore the environment and avoid premature termination of episodes. The PPO algorithm with RGB Pixels shows a moderate performance, but with higher variance. The Internal State versions of both algorithms appear to converge to shorter, more stable episodes. The spikes in the PPO (RGB Pixels) line might represent periods of exploration or encountering challenging states. The logarithmic scale on the y-axis emphasizes the large difference in episode lengths achieved by MaskablePPO (RGB Pixels) compared to the others. The choice of observation type (RGB Pixels vs. Internal State) appears to significantly impact the algorithm's performance, with RGB Pixels generally leading to longer, but more variable, episodes.
</details>
Figure 4: (left) We demonstrate the effect of action masking in both RGB observation and internal game state. By masking moves that do not change the current state, the agent requires fewer actions to explore, and therefore, on average solves a puzzle using fewer steps. (right) Moving average episode length during training for the Flood puzzle. Lower episode length is better, as the episode gets terminated as soon as the agent has solved a puzzle. Different colors describe different algorithms, where different shades of a color indicate different random seeds. Sparse dots indicate that an agent only occasionally managed to find a policy that solves a puzzle. It can be seen that both the use of discrete internal state observations and action masking have a positive effect on the training, leading to faster convergence and a stronger overall performance.
As we can observe in Figure 4, action masking has a strongly positive effect on training performance. This benefit is observed both in the discrete internal game state observations and on the pixel observations. We hypothesize that this is due to the more efficient exploration, as actions without effect are not allowed. As a result, the reward density during training is increased, and agents are able to learn a better policy. Particularly noteworthy are the outcomes related to Pegs. They show that an agent with action masking can effectively learn a successful policy, while a random policy without action masking consistently fails to solve any instance. As expected, training RL agents on pixel observations increases the difficulty of the task at hand. The agent must first understand how the pixel observation relates to the internal state of the game before it is able to solve the puzzle. Nevertheless, in combination with action masking, the agents manage to solve a large percentage of all puzzle instances, with 10 of the puzzles consistently solved within the optimal upper bound.
Furthermore, Figure 4 shows the individual training performance on the puzzle Flood. It can be seen that RL agents using action masking and the discrete internal game state observation converge significantly faster and to better policies compared to the baselines. The agents using pixel observations and no action masking struggle to converge to any reasonable policy.
3.4 Effect of Episode Length and Early Termination
We evaluate whether the cutoff episode length or early termination have an effect on training performance of the agents. For computational reasons, we perform these experiments on a selected subset of the puzzles on human level difficulty and only for DreamerV3 (see Section E.5 for details). As we can see in Table 2, increasing the maximum episode length during training from 10,000 to 100,000 does not improve performance. Only when episodes get terminated after visiting the exact same state more than 10 times, the agent is able to solve more puzzle instances on average (31.5% vs. 25.2%). Given the sparse reward structure, terminating episodes early seems to provide a better trade-off between allowing long trajectories to successfully complete and avoiding wasting resources on unsuccessful trajectories.
Table 2: Comparison of the effect of the maximum episode length (# Steps) and early termination (ET) on final performance. For each setting, we report average success episode length with standard deviation with respect to the random seed, all averaged over all selected puzzles. In brackets, the percentage of successful episodes is reported.
| $1e5$ | 10 | $2950.9± 1260.2$ (31.6%) |
| --- | --- | --- |
| - | $2975.4± 1503.5$ (25.2%) | |
| $1e4$ | 10 | $3193.9± 1044.2$ (26.1%) |
| - | $2892.4± 908.3$ (26.8%) | |
3.5 Generalization
PUZZLES is explicitly designed to facilitate the testing of generalization capabilities of agents with respect to different puzzle sizes or puzzle difficulties. For our experiments, we select puzzles with the highest reward density. We utilize a a custom observation wrapper and transformer-based encoder in order for the agent to be able to work with different input sizes, see Sections A.3 and A.4 for details. We call this approach PPO (Transformer)
Table 3: We test generalization capabilities of agents by evaluating them on puzzle sizes larger than their training environment. We report the average number of steps an agent needs to solve a puzzle, and the percentage of successful episodes in brackets. The difficulty levels correspond to the overall easiest and the easiest-for-humans settings. For PPO (Transformer), we selected the best checkpoint during training according to the performance in the training environment. For PPO (Transformer) †, we selected the best checkpoint during training according to the performance in the generalization environment.
| Netslide | 2x3b1 | ✓ | $244.1± 313.7$ (100.0%) | $242.0± 379.3$ (100.0%) |
| --- | --- | --- | --- | --- |
| 3x3b1 | ✗ | $9014.6± 2410.6$ (18.6%) | $9002.8± 2454.9$ (18.0%) | |
| Same Game | 2x3c3s2 | ✓ | $9.3± 10.9$ (99.8%) | $26.2± 52.9$ (99.7%) |
| 5x5c3s2 | ✗ | $379.0± 261.6$ (9.4%) | $880.1± 675.4$ (18.1%) | |
| Untangle | 4 | ✓ | $38.6± 58.2$ (99.8%) | $69.8± 66.4$ (100.0%) |
| 6 | ✗ | $3340.0± 3101.2$ (87.3%) | $2985.8± 2774.7$ (93.7%) | |
The results presented in Table 3 indicate that while it is possible to learn a policy that generalizes it remains a challenging problem. Furthermore, it can be observed that selecting the best model during training according to the performance on the generalization environment yields a performance benefit in that setting. This suggests that agents may learn a policy that generalizes better during the training process, but then overfit on the environment they are training on. It is also evident that generalization performance varies substantially across different random seeds. For Netslide, the best agent is capable of solving 23.3% of the puzzles in the generalization environment whereas the worst agent is only able to solve 11.2% of the puzzles, similar to a random policy. Our findings suggest that agents are generally capable of generalizing to more complex puzzles. However, further research is necessary to identify the appropriate inductive biases that allow for consistent generalization without a significant decline in performance.
4 Discussion
The experimental evaluation demonstrates varying degrees of success among different algorithms. For instance, puzzles such as Tracks, Map or Flip were not solvable by any of the evaluated RL agents, or only with performance similar to a random policy. This points towards the potential of intermediate rewards, better game rule-specific action masking, or model-based approaches. To encourage exploration in the state space, a mechanism that explicitly promotes it may be beneficial. On the other hand, the fact that some algorithms managed to solve a substantial amount of puzzles with presumably optimal performance demonstrates the advances in the field of RL. In light of the promising results of DreamerV3, the improvement of agents that have certain reasoning capabilities and an implicit world model by design stay an important direction for future research.
Experimental Results.
The experimental results presented in Section 3.1 and Section 3.3 underscore the positive impact of action masking and the correct observation type on performance. While a pixel representation would lead to a uniform observation for all puzzles, it currently increases complexity too much compared the discrete internal game state. Our findings indicate that incorporating action masking significantly improves the training efficiency of reinforcement learning algorithms. This enhancement was observed in both discrete internal game state observations and pixel observations. The mechanism for this improvement can be attributed to enhanced exploration, resulting in agents being able to learn more robust and effective policies. This was especially evident in puzzles where unmasked agents had considerable difficulty, thus showcasing the tangible advantages of implementing action masking for these puzzles.
Limitations.
While the PUZZLES framework provides the ability to gain comprehensive insights into the performance of various RL algorithms on logic puzzles, it is crucial to recognize certain limitations when interpreting results. The sparse rewards used in this baseline evaluation add to the complexity of the task. Moreover, all algorithms were evaluated with their default hyper-parameters. Additionally, the constraint of discrete action spaces excludes the application of certain RL algorithms.
In summary, the different challenges posed by the logic-requiring nature of these puzzles necessitates a good reward system, strong guidance of agents, and an agent design more focused on logical reasoning capabilities. It will be interesting to see how alternative architectures such as graph neural networks (GNNs) perform. GNNs are designed to align more closely with the algorithmic solution of many puzzles. While the notion that “reward is enough” [57, 58] might hold true, our results indicate that not just any form of correct reward will suffice, and that advanced architectures might be necessary to learn an optimal solution.
5 Conclusion
In this work, we have proposed PUZZLES, a benchmark that bridges the gap between algorithmic reasoning and RL. In addition to containing a rich diversity of logic puzzles, PUZZLES also offers an adjustable difficulty progression for each puzzle, making it a useful tool for benchmarking, evaluating and improving RL algorithms. Our empirical evaluation shows that while RL algorithms exhibit varying degrees of success, challenges persist, particularly in puzzles with higher complexity or those requiring nuanced logical reasoning. We are excited to share PUZZLES with the broader research community and hope that PUZZLES will foster further research for improving the algorithmic reasoning abilities of RL algorithms.
Broader Impact
This paper aims to contribute to the advancement of the field of Machine Learning (ML). Given the current challenges in ML related to algorithmic reasoning, we believe that our newly proposed benchmark will facilitate significant progress in this area, potentially elevating the capabilities of ML systems. Progress in algorithmic reasoning can contribute to the development of more transparent, explainable, and fair ML systems. This can further help address issues related to bias and discrimination in automated decision-making processes, promoting fairness and accountability.
References
- Serafini and Garcez [2016] Luciano Serafini and Artur d’Avila Garcez. Logic tensor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422, 2016.
- Dai et al. [2019] Wang-Zhou Dai, Qiuling Xu, Yang Yu, and Zhi-Hua Zhou. Bridging machine learning and logical reasoning by abductive learning. Advances in Neural Information Processing Systems, 32, 2019.
- Li et al. [2020] Yujia Li, Felix Gimeno, Pushmeet Kohli, and Oriol Vinyals. Strong generalization and efficiency in neural programs. arXiv preprint arXiv:2007.03629, 2020.
- Veličković and Blundell [2021] Petar Veličković and Charles Blundell. Neural algorithmic reasoning. Patterns, 2(7), 2021.
- Masry et al. [2022] Ahmed Masry, Do Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. Chartqa: A benchmark for question answering about charts with visual and logical reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2263–2279, 2022.
- Jiao et al. [2022] Fangkai Jiao, Yangyang Guo, Xuemeng Song, and Liqiang Nie. Merit: Meta-path guided contrastive learning for logical reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3496–3509, 2022.
- Bardin et al. [2023] Sébastien Bardin, Somesh Jha, and Vijay Ganesh. Machine learning and logical reasoning: The new frontier (dagstuhl seminar 22291). In Dagstuhl Reports, volume 12. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2023.
- Li et al. [2021] Wenda Li, Lei Yu, Yuhuai Wu, and Lawrence C Paulson. Isarstep: a benchmark for high-level mathematical reasoning. In International Conference on Learning Representations, 2021.
- Veličković et al. [2022] Petar Veličković, Adrià Puigdomènech Badia, David Budden, Razvan Pascanu, Andrea Banino, Misha Dashevskiy, Raia Hadsell, and Charles Blundell. The CLRS algorithmic reasoning benchmark. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 22084–22102. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/velickovic22a.html.
- Srivastava et al. [2022] Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Mnih et al. [2013] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing Atari with Deep Reinforcement Learning. CoRR, abs/1312.5602, 2013. URL http://arxiv.org/abs/1312.5602.
- Tang et al. [2017] Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Silver et al. [2018] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Badia et al. [2020] Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, and Charles Blundell. Agent57: Outperforming the atari human benchmark. In International conference on machine learning, pages 507–517. PMLR, 2020.
- Wurman et al. [2022] Peter R Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas J Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, et al. Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
- Kalashnikov et al. [2018] Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673. PMLR, 2018.
- Kiran et al. [2021] B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
- Rudin et al. [2022] Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning, pages 91–100. PMLR, 2022.
- Rana et al. [2023] Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, and Niko Sünderhauf. Residual skill policies: Learning an adaptable skill-based action space for reinforcement learning for robotics. In Conference on Robot Learning, pages 2095–2104. PMLR, 2023.
- Wang and Hong [2020] Zhe Wang and Tianzhen Hong. Reinforcement learning for building controls: The opportunities and challenges. Applied Energy, 269:115036, 2020.
- Wu et al. [2022] Di Wu, Yin Lei, Maoen He, Chunjiong Zhang, and Li Ji. Deep reinforcement learning-based path control and optimization for unmanned ships. Wireless Communications and Mobile Computing, 2022:1–8, 2022.
- Brunke et al. [2022] Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
- Todorov et al. [2012] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
- Bellemare et al. [2013] Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Brockman et al. [2016] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Duan et al. [2016] Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329–1338. PMLR, 2016.
- Tassa et al. [2018] Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Côté et al. [2018] Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Ruo Yu Tao, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. Textworld: A learning environment for text-based games. CoRR, abs/1806.11532, 2018.
- Lanctot et al. [2019] Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, and Jonah Ryan-Davis. OpenSpiel: A framework for reinforcement learning in games. CoRR, abs/1908.09453, 2019. URL http://arxiv.org/abs/1908.09453.
- Jiang and Luo [2019] Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning, pages 3110–3119. PMLR, 2019.
- Fawzi et al. [2022] Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J R Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022.
- Mankowitz et al. [2023] Daniel J Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618(7964):257–263, 2023.
- Lai [2015] Matthew Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv preprint arXiv:1509.01549, 2015.
- Silver et al. [2016] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 2016. URL https://doi.org/10.1038/nature16961.
- Tatham [2004a] Simon Tatham. Simon tatham’s portable puzzle collection, 2004a. URL https://www.chiark.greenend.org.uk/~sgtatham/puzzles/. Accessed: 2023-05-16.
- Foundation [2022] Farama Foundation. Gymnasium website, 2022. URL https://gymnasium.farama.org/. Accessed: 2023-05-12.
- Wang et al. [2022] Chao Wang, Chen Chen, Dong Li, and Bin Wang. Rethinking reinforcement learning based logic synthesis. arXiv preprint arXiv:2205.07614, 2022.
- Dasgupta et al. [2019] Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, and Zeb Kurth-Nelson. Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162, 2019.
- Eppe et al. [2022] Manfred Eppe, Christian Gumbsch, Matthias Kerzel, Phuong DH Nguyen, Martin V Butz, and Stefan Wermter. Intelligent problem-solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1):11–20, 2022.
- Deac et al. [2021] Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic, Pierre-Luc Bacon, Jian Tang, and Mladen Nikolic. Neural algorithmic reasoners are implicit planners. Advances in Neural Information Processing Systems, 34:15529–15542, 2021.
- He et al. [2022] Yu He, Petar Veličković, Pietro Liò, and Andreea Deac. Continuous neural algorithmic planners. In Learning on Graphs Conference, pages 54–1. PMLR, 2022.
- Silver et al. [2017] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
- Dahl [2001] Fredrik A Dahl. A reinforcement learning algorithm applied to simplified two-player texas hold’em poker. In European Conference on Machine Learning, pages 85–96. Springer, 2001.
- Heinrich and Silver [2016] Johannes Heinrich and David Silver. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- Steinberger [2019] Eric Steinberger. Pokerrl. https://github.com/TinkeringCode/PokerRL, 2019.
- Zhao et al. [2022] Enmin Zhao, Renye Yan, Jinqiu Li, Kai Li, and Junliang Xing. Alphaholdem: High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4689–4697, 2022.
- Ghory [2004] Imran Ghory. Reinforcement learning in board games. 2004.
- Szita [2012] István Szita. Reinforcement learning in games. In Reinforcement Learning: State-of-the-art, pages 539–577. Springer, 2012.
- Xenou et al. [2019] Konstantia Xenou, Georgios Chalkiadakis, and Stergos Afantenos. Deep reinforcement learning in strategic board game environments. In Multi-Agent Systems: 16th European Conference, EUMAS 2018, Bergen, Norway, December 6–7, 2018, Revised Selected Papers 16, pages 233–248. Springer, 2019.
- Perolat et al. [2022] Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, et al. Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
- Cormen et al. [2022] Thomas H. Cormen, Charles Eric Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press, 4th edition, 2022.
- Raffin et al. [2021] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- Werner Duvaud [2019] Aurèle Hainaut Werner Duvaud. Muzero general: Open reimplementation of muzero. https://github.com/werner-duvaud/muzero-general, 2019.
- Hafner et al. [2023a] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. https://github.com/danijar/dreamerv3, 2023a.
- Haarnoja et al. [2018] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Fujimoto et al. [2018] Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pages 1587–1596. PMLR, 2018.
- Silver et al. [2021] David Silver, Satinder Singh, Doina Precup, and Richard S Sutton. Reward is enough. Artificial Intelligence, 299:103535, 2021.
- Vamplew et al. [2022] Peter Vamplew, Benjamin J Smith, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Diederik M Roijers, Conor F Hayes, Fredrik Heintz, Patrick Mannion, Pieter JK Libin, et al. Scalar reward is not enough: A response to silver, singh, precup and sutton (2021). Autonomous Agents and Multi-Agent Systems, 36(2):41, 2022.
- Community [2000] Pygame Community. Pygame github repository, 2000. URL https://github.com/pygame/pygame/. Accessed: 2023-05-12.
- Tatham [2004b] Simon Tatham. Developer documentation for simon tatham’s puzzle collection, 2004b. URL https://www.chiark.greenend.org.uk/~sgtatham/puzzles/devel/. Accessed: 2023-05-23.
- Schulman et al. [2017] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017. URL http://arxiv.org/abs/1707.06347.
- Huang et al. [2022] Shengyi Huang, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, and Weixun Wang. The 37 implementation details of proximal policy optimization. In ICLR Blog Track, 2022. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
- Mnih et al. [2016] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, abs/1602.01783, 2016. URL http://arxiv.org/abs/1602.01783.
- Schulman et al. [2015] John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schulman15.html.
- Dabney et al. [2017] Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos. Distributional reinforcement learning with quantile regression. CoRR, abs/1710.10044, 2017. URL http://arxiv.org/abs/1710.10044.
- Schrittwieser et al. [2020] Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Hafner et al. [2023b] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023b.
Appendix A PUZZLES Environment Usage Guide
A.1 General Usage
A Python code example for using the PUZZLES environment is provided in LABEL:code:init-and-play-episode. All puzzles support seeding the initialization, by adding #{seed} after the parameters, where {seed} is an int. The allowed parameters are displayed in LABEL:tab:parameters. A full custom initialization argument would be as follows: {parameters}#{seed}.
⬇
1 import gymnasium as gym
2 import rlp
3
4 # init an agent suitable for Gymnasium environments
5 agent = Agent. create ()
6
7 # init the environment
8 env = gym. make (’rlp/Puzzle-v0’, puzzle = "bridges",
9 render_mode = "rgb_array", params = "4x4#42")
10 observation, info = env. reset ()
11
12 # complete an episode
13 terminated = False
14 while not terminated:
15 action = agent. choose (env) # the agent chooses the next action
16 observation, reward, terminated, truncated, info = env. step (action)
17 env. close ()
Listing 1: Code example of how to initialize an environment and have an agent complete one episode. The PUZZLES environment is designed to be compatible with the Gymnasium API. The choice of Agent is up to the user, it can be a trained agent or random policy.
A.2 Custom Reward
A Python code example for implementing a custom reward system is provided in LABEL:code:custom-reward-wrapper. To this end, the environment’s step() function provides the puzzle’s internal state inside the info Python dict.
⬇
1 import gymnasium as gym
2 class PuzzleRewardWrapper (gym. Wrapper):
3 def step (self, action):
4 obs, reward, terminated, truncated, info = self. env. step (action)
5 # Modify the reward by using members of info["puzzle_state"]
6 return obs, reward, terminated, truncated, info
Listing 2: Code example of a custom reward implementation using Gymnasium’s Wrapper class. A user can use the game state information provided in info["puzzle_state"] to modify the rewards received by the agent after performing an action.
A.3 Custom Observation
A Python code example for implementing a custom observation structure that is compatible with an agent using a transformer encoder. Here, we provide the example for Netslide, please refer to our GitHub for more examples.
⬇
1 import gymnasium as gym
2 import numpy as np
3 class NetslideTransformerWrapper (gym. ObservationWrapper):
4 def __init__ (self, env):
5 super (NetslideTransformerWrapper, self). __init__ (env)
6 self. original_space = env. observation_space
7
8 self. max_length = 512
9 self. embedding_dim = 16 + 4
10 self. observation_space = gym. spaces. Box (
11 low =-1, high =1, shape =(self. max_length, self. embedding_dim,), dtype = np. float32
12 )
13
14 self. observation_space = gym. spaces. Dict (
15 {’obs’: self. observation_space,
16 ’len’: gym. spaces. Box (low =0, high = self. max_length, shape =(1,),
17 dtype = np. int32)}
18 )
19
20 def observation (self, obs):
21 # The original observation is an ordereddict with the keys [’barriers’, ’cursor_pos’, ’height’,
22 # ’last_move_col’, ’last_move_dir’, ’last_move_row’, ’move_count’, ’movetarget’, ’tiles’, ’width’, ’wrapping’]
23 # We are only interested in ’barriers’, ’tiles’, ’cursor_pos’, ’height’ and ’width’
24 barriers = obs [’barriers’]
25 # each element of barriers is an uint16, signifying different elements
26 barriers = np. unpackbits (barriers. view (np. uint8)). reshape (-1, 16)
27 # add some positional embedding to the barriers
28 embedded_barriers = np. concatenate (
29 [barriers, self. pos_embedding (np. arange (barriers. shape [0]), obs [’width’], obs [’height’])], axis =1)
30
31 tiles = obs [’tiles’]
32 # each element of tiles is an uint16, signifying different elements
33 tiles = np. unpackbits (tiles. view (np. uint8)). reshape (-1, 16)
34 # add some positional embedding to the tiles
35 embedded_tiles = np. concatenate (
36 [tiles, self. pos_embedding (np. arange (tiles. shape [0]), obs [’width’], obs [’height’])], axis =1)
37 cursor_pos = obs [’cursor_pos’]
38
39 embedded_cursor_pos = np. concatenate (
40 [np. ones ((1, 16)), self. pos_embedding_cursor (cursor_pos, obs [’width’], obs [’height’])], axis =1)
41
42 embedded_obs = np. concatenate ([embedded_barriers, embedded_tiles, embedded_cursor_pos], axis =0)
43
44 current_length = embedded_obs. shape [0]
45 # pad with zeros to accomodate different sizes
46 if current_length < self. max_length:
47 embedded_obs = np. concatenate (
48 [embedded_obs, np. zeros ((self. max_length - current_length, self. embedding_dim))], axis =0)
49 return {’obs’: embedded_obs, ’len’: np. array ([current_length])}
50
51 @staticmethod
52 def pos_embedding (pos, width, height):
53 # pos is an array of integers from 0 to width*height
54 # width and height are integers
55 # return a 2D array with the positional embedding, using sin and cos
56 x, y = pos % width, pos // width
57 # x and y are integers from 0 to width-1 and height-1
58 pos_embed = np. zeros ((len (pos), 4))
59 pos_embed [:, 0] = np. sin (2 * np. pi * x / width)
60 pos_embed [:, 1] = np. cos (2 * np. pi * x / width)
61 pos_embed [:, 2] = np. sin (2 * np. pi * y / height)
62 pos_embed [:, 3] = np. cos (2 * np. pi * y / height)
63 return pos_embed
64
65 @staticmethod
66 def pos_embedding_cursor (pos, width, height):
67 # cursor pos goes from -1 to width or height
68 x, y = pos
69 x += 1
70 y += 1
71 width += 1
72 height += 1
73 pos_embed = np. zeros ((1, 4))
74 pos_embed [0, 0] = np. sin (2 * np. pi * x / width)
75 pos_embed [0, 1] = np. cos (2 * np. pi * x / width)
76 pos_embed [0, 2] = np. sin (2 * np. pi * y / height)
77 pos_embed [0, 3] = np. cos (2 * np. pi * y / height)
78 return pos_embed
Listing 3: Code example of a custom observation implementation using Gymnasium’s Wrapper class. A user can use the all elements of rpovided in the obs dict to create a custom observation. In this code example, the resulting observation is suitable for a transformer-based encoder.
A.4 Generalization Example
In LABEL:code:transformer-encoder, we show how a transformer-based features extractor can be built for Stable Baseline 3’s PPO MultiInputPolicy. Together with the observations from LABEL:code:custom-observation-wrapper, this feature extractor can work with variable-length inputs. This allows for easy evaluation in environments of different sizes than the environment the agent was originally trained in.
⬇
1 import gymnasium as gym
2 import numpy as np
3 from stable_baselines3. common. torch_layers import BaseFeaturesExtractor
4 from stable_baselines3 import PPO
5 import torch
6 import torch. nn as nn
7 from torch. nn import TransformerEncoder, TransformerEncoderLayer
8
9 class TransformerFeaturesExtractor (BaseFeaturesExtractor):
10 def __init__ (self, observation_space, data_dim, embedding_dim, nhead, num_layers, dim_feedforward, dropout =0.1):
11 super (TransformerFeaturesExtractor, self). __init__ (observation_space, embedding_dim)
12 self. transformer = Transformer (embedding_dim = embedding_dim,
13 data_dim = data_dim,
14 nhead = nhead,
15 num_layers = num_layers,
16 dim_feedforward = dim_feedforward,
17 dropout = dropout)
18
19 def forward (self, observations: gym. spaces. Dict) -> torch. Tensor:
20 # Extract the ’obs’ key from the dict
21 obs = observations [’obs’]
22 length = observations [’len’]
23 # all elements of length should be the same (we can’t train on different puzzle sizes at the same time)
24 length = int (length [0])
25 obs = obs [:, : length]
26 # Return the embedding of the cursor token (which is last)
27 return self. transformer (obs)[:, -1, :]
28
29
30 class Transformer (nn. Module):
31 def __init__ (self, embedding_dim, data_dim, nhead, num_layers, dim_feedforward, dropout =0.1):
32 super (Transformer, self). __init__ ()
33 self. embedding_dim = embedding_dim
34 self. data_dim = data_dim
35
36 self. lin = nn. Linear (data_dim, embedding_dim)
37
38 encoder_layers = TransformerEncoderLayer (
39 d_model = self. embedding_dim,
40 nhead = nhead,
41 dim_feedforward = dim_feedforward,
42 dropout = dropout,
43 batch_first = True
44 )
45
46 self. transformer_encoder = TransformerEncoder (encoder_layers, num_layers)
47
48 def forward (self, x):
49 # x is of shape (batch_size, seq_length, embedding_dim)
50 x = self. lin (x)
51 transformed = self. transformer_encoder (x)
52 return transformed
53
54 if __name__ == "__main__":
55 policy_kwargs = dict (
56 features_extractor_class = TransformerFeaturesExtractor,
57 features_extractor_kwargs = dict (embedding_dim = args. transformer_embedding_dim,
58 nhead = args. transformer_nhead,
59 num_layers = args. transformer_layers,
60 dim_feedforward = args. transformer_ff_dim,
61 dropout = args. transformer_dropout,
62 data_dim = data_dims [args. puzzle])
63 )
64
65 model = PPO ("MultiInputPolicy",
66 env,
67 policy_kwargs = policy_kwargs,
68 )
Listing 4: Code example of a transformer-based feature extractor written in PyTorch, compatible with Stable Baselines 3’s PPO. This encoder design allows for variable-length inputs, enabling generalization to previously unseen puzzle sizes.
Appendix B Environment Features
B.1 Episode Definition
An episode is played with the intention of solving a given puzzle. The episode begins with a newly generated puzzle and terminates in one of two states. To achieve a reward, the puzzle is either solved completely or the agent has failed irreversibly. The latter state is unlikely to occur, as only a few games, for example pegs or minesweeper, are able to terminate in a failed state. Alternatively, the episode can be terminated early. Starting a new episode generates a new puzzle of the same kind, with the same parameters such as size or grid type. However, if the random seed is not fixed, the puzzle is likely to have a different layout from the puzzle in the previous episode.
B.2 Observation Space
There are two kinds of observations which can be used by the agent. The first observation type is a representation of the discrete internal game state of the puzzle, consisting of a combination of arrays and scalars. This observation is provided by the underlying code of Tathams’s puzzle collection. The composition and shape of the internal game state is different for each puzzle, which, in turn, requires the agent architecture to be adapted.
The second type of observation is a representation of the pixel screen, given as an integer matrix of shape (3 $×$ width $×$ height). The environment deals with different aspect ratios by adding padding. The advantage of the pixel representation is a consistent representation for all puzzles, similar to the Atari RL Benchmark [11]. It could even allow for a single agent to be trained on different puzzles. On the other hand, it forces the agent to learn to solve the puzzles only based on the visual representation of the puzzles, analogous to human players. This might increase difficulty as the agent has to learn the task representation implicitly.
B.3 Action Space
Natively, the puzzles support two types of input, mouse and keyboard. Agents in PUZZLES play the puzzles only through keyboard input. This is due to our decision to provide the discrete internal game state of the puzzle as an observation, for which mouse input would not be useful.
The action space for each puzzle is restricted to actions that can actively contribute to changing the logical state of a puzzle. This excludes “memory aides” such as markers that signify the absence of a certain connection in Bridges or adding candidate digits in cells in Sudoku. The action space also includes possibly rule-breaking actions, as long as the game can represent the effect of the action correctly.
The largest action space has a cardinality of 14, but most puzzles only have five to six valid actions which the agent can choose from. Generally, an action is in one of two categories: selector movement or game state change. Selector movement is a mechanism that allows the agent to select game objects during play. This includes for example grid cells, edges, or screen regions. The selector can be moved to the next object by four discrete directional inputs and as such represents an alternative to continuous mouse input. A game state change action ideally follows a selector movement action. The game state change action will then be applied to the selected object. The environment responds by updating the game state, for example by entering a digit or inserting a grid edge at the current selector position.
B.4 Action Masking
The fixed-size action space allows an agent to execute actions that may not result in any change in game state. For example, the action of moving the selector to the right if the selector is already placed at the right border. The PUZZLES environment provides an action mask that marks all actions that change the state of the game. Such an action mask can be used to improve performance of model-based and even some model-free RL approaches. The action masking provided by PUZZLES does not ensure adherence to game rules, rule-breaking actions can most often still be represented as a change in the game state.
B.5 Reward Structure
In the default implementation, the agent only receives a reward for completing an episode. Rewards consist of a fixed positive value for successful completion and a fixed negative value otherwise. This reward structure encourages an agent to solve a given puzzle in the least amount of steps possible. The PUZZLES environment provides the option to define intermediate rewards tailored to specific puzzles, which could help improve training progress. This could be, for example, a negative reward if the agent breaks the rules of the game, or a positive reward if the agent correctly achieves a part of the final solution.
B.6 Early Episode Termination
Most of the puzzles in PUZZLES do not have an upper bound on the number of steps, where the only natural end can be reached via successfully solving the puzzle. The PUZZLES environment also provides the option for early episode termination based on state repetitions. If an agent reaches the exact same game state multiple times, the episode can be terminated in order to prevent wasteful continuation of episodes that no longer contribute to learning or are bound to fail.
Appendix C PUZZLES Implementation Details
In the following, a brief overview of PUZZLES ’s code implementation is given. The environment is written in both Python and C, in order to interface with Gymnasium [36] as the RL toolkit and the C source code of the original puzzle collection. The original puzzle collection source code is available under the MIT License. The source code and license are available at https://www.chiark.greenend.org.uk/~sgtatham/puzzles/. In maintext Figure 2, an overview of the environment and how it fits with external libraries is presented. The modular design in both PUZZLES and the Puzzle Collection’s original code allows users to build and integrate new puzzles into the environment.
Environment Class
The reinforcement learning environment is implemented in the Python class PuzzleEnv in the rlp package. It is designed to be compatible with the Gymnasium-style API for RL environments to facilitate easy adoption. As such, it provides the two important functions needed for progressing an environment, reset() and step().
Upon initializing a PuzzleEnv, a 2D surface displaying the environment is created. This surface and all changes to it are handled by the Pygame [59] graphics library. PUZZLES uses various functions provided in the library, such as shape drawing, or partial surface saving and loading.
The reset() function changes the environment state to the beginning of a new episode, usually by generating a new puzzle with the given parameters. An agent solving the puzzle is also reset to a new state. reset() also returns two variables, observation and info, where observation is a Python dict containing a NumPy 3D array called pixels of size (3 $×$ surface_width $×$ surface_height). This NumPy array contains the RGB pixel data of the Pygame surface, as explained in Section B.2. The info dict contains a dict called puzzle_state, representing a copy of the current internal data structures containing the logical game state, allowing the user to create custom rewards.
The step() function increments the time in the environment by one step, while performing an action chosen from the action space. Upon returning, step() provides the user with five variables, listed in Table 4.
Table 4: Return values of the environment’s step() function. This information can then be used by an RL framework to train an agent.
| Variable observation reward | Description 3D NumPy array containing RGB pixel data The cumulative reward gained throughout all steps of the episode |
| --- | --- |
| terminated | A bool stating whether an episode was completed by the agent |
| truncated | A bool stating whether an episode was ended early, for example by reaching |
| the maximum allowed steps for an episode | |
| info | A dict containing a copy of the internal game state |
Intermediate Rewards
The environment encourages the use of Gymnasium’s Wrapper interface to implement custom reward structures for a given puzzle. Such custom reward structures can provide an easier game setting, compared to the sparse reward only provided when finishing a puzzle.
Puzzle Module
The PuzzleEnv object creates an instance of the class Puzzle. A Puzzle is essentially the glue between all Pygame surface tasks and the C back-end that contains the puzzle logic. To this end, it initializes a Pygame window, on which shapes and text are drawn. The Puzzle instance also loads the previously compiled shared library containing the C back-end code for the relevant puzzle.
The PuzzleEnv also converts and forwards keyboard inputs (which are for example given by an RL agent’s action) into the format the C back-end understands.
Compiled C Code
The C part of the environment sits on top of the highly-optimized original puzzle collection source code as a custom front-end, as detailed in the collection’s developer documentation [60]. Similar to other front-end types, it represents the bridge between the graphics library that is used to display the puzzles and the game logic back-end. Specifically, this is done using Python API calls to Pygame’s drawing facilities.
Appendix D Puzzle Descriptions
We provide short descriptions of each puzzle from www.chiark.greenend.org.uk/ sgtatham/puzzles/. For detailed instructions for each puzzle, please visit the docs available at www.chiark.greenend.org.uk/ sgtatham/puzzles/doc/index.html
<details>
<summary>extracted/5699650/img/puzzles/blackbox.png Details</summary>

### Visual Description
\n
## Diagram: Grid with Markers
### Overview
The image depicts a grid, likely representing a game board or coordinate system. The grid is 8x8, with alphanumeric labels along the top and left sides. There are three black circular markers placed at specific locations within the grid. The grid is filled with a light gray color.
### Components/Axes
* **Horizontal Axis:** Labeled with numbers 1 through 5, then 'H', then 2, and finally 1 and 4.
* **Vertical Axis:** Labeled with numbers 2, 3, 3, 4, and 'R', 'H'.
* **Markers:** Three black circular markers are present.
* **Grid:** 8x8 grid of squares.
### Detailed Analysis or Content Details
The markers are located at the following coordinates:
1. **Marker 1:** Row 'H', Column 1.
2. **Marker 2:** Row 3, Column 4.
3. **Marker 3:** Row 'H', Column 5.
The grid is labeled as follows:
* Top Row: 5, 1, 5, H, 2
* Left Column: 2, R, H, 3, 3, 4
* Bottom Row: 1, 4
### Key Observations
The markers are not evenly distributed across the grid. Two markers are located on the same row ('H'), while the third is on row 3. The horizontal positions of the markers are 1, 4, and 5.
### Interpretation
This diagram likely represents a game board, potentially for a game like Battleship or a similar coordinate-based game. The letters and numbers serve as coordinates for identifying locations on the board. The black markers likely represent placed pieces or hits. The arrangement of the markers suggests a strategic placement, with two pieces clustered on the 'H' row and one further to the right. Without knowing the rules of the game, it's difficult to determine the significance of this arrangement. The 'R' label on the vertical axis is unusual and may have a specific meaning within the game's context. The 'H' label appears twice, once on the horizontal axis and twice on the vertical axis. This could indicate a special row or column.
</details>
Figure 5: Black Box: Find the hidden balls in the box by bouncing laser beams off them.
<details>
<summary>extracted/5699650/img/puzzles/bridges.png Details</summary>

### Visual Description
\n
## Diagram: Network/Graph Representation
### Overview
The image depicts a network or graph composed of interconnected nodes. Each node is represented by a circle containing a numerical value. The nodes are connected by lines, visually representing the relationships or connections between them. The background is a uniform gray.
### Components/Axes
The diagram consists of:
* **Nodes:** Circles containing the numbers 1, 2, 3, 4, 5, and 6.
* **Edges:** Lines connecting the nodes.
* **Background:** A solid gray color.
There are no explicit axes or legends. The numerical values within the nodes appear to be node identifiers or weights.
### Detailed Analysis or Content Details
The diagram can be described as follows, tracing the connections and node values:
1. Starting from the top-left: Node '1' is connected to Node '4'.
2. Node '4' is connected to Node '4'.
3. Node '4' is connected to Node '4'.
4. Node '4' is connected to Node '6'.
5. Node '6' is connected to Node '2'.
6. Node '2' is connected to Node '1'.
7. Node '1' is connected to Node '4'.
8. Node '4' is connected to Node '4'.
9. Node '4' is connected to Node '2'.
10. Node '2' is connected to Node '2'.
11. Node '2' is connected to Node '4'.
12. Node '4' is connected to Node '5'.
13. Node '5' is connected to Node '4'.
14. Node '4' is connected to Node '2'.
15. Node '2' is connected to Node '1'.
16. Node '1' is connected to Node '3'.
17. Node '3' is connected to Node '2'.
18. Node '2' is connected to Node '4'.
19. Node '4' is connected to Node '3'.
The network forms a roughly rectangular shape with internal connections. The number '4' appears most frequently as a node value and is involved in a large number of connections.
### Key Observations
* The node with the value '4' is highly connected, acting as a central hub within the network.
* The values 1, 2, 3, 5, and 6 appear less frequently and are connected to '4' in various ways.
* There is no apparent directionality to the edges; they appear to represent undirected relationships.
* The network is relatively dense, with many nodes having multiple connections.
### Interpretation
The diagram likely represents a network of relationships or dependencies. The numerical values within the nodes could represent various attributes, such as weight, capacity, or importance. The high connectivity of node '4' suggests it plays a critical role in the network, potentially acting as a central processing unit, a key resource, or a major influencer. The diagram could be a simplified model of a communication network, a transportation system, or a social network. Without additional context, it is difficult to determine the specific meaning of the network. The arrangement of the nodes and edges suggests a structured system, but the exact nature of that structure remains unclear. The diagram is a visual representation of connections and could be used to analyze network properties such as centrality, connectivity, and path length.
</details>
Figure 6: Bridges: Connect all the islands with a network of bridges.
<details>
<summary>extracted/5699650/img/puzzles/cube.png Details</summary>

### Visual Description
\n
## Diagram: Block Arrangement
### Overview
The image depicts a 4x4 grid of squares. Some squares are filled with a solid blue color, while others are left empty (white). A three-dimensional, open-topped cube shape is positioned within the grid, partially overlapping some of the squares. The cube appears to be "cut out" of the grid, creating a visual impression of depth.
### Components/Axes
There are no explicit axes or labels. The grid itself forms the primary structure. The components are:
* **Grid:** A 4x4 arrangement of squares.
* **Filled Squares:** Squares colored solid blue.
* **Empty Squares:** Squares left white.
* **Cube:** A three-dimensional, open-topped cube shape.
### Detailed Analysis or Content Details
The blue filled squares form an "L" shape. The L shape occupies the following grid positions (row, column, starting from 1):
* (1, 1)
* (1, 2)
* (1, 3)
* (2, 3)
* (3, 3)
* (3, 2)
* (3, 1)
The cube is positioned in the bottom-right quadrant of the grid. It occupies the following grid positions:
* (3, 4)
* (4, 4)
* (4, 3)
* (3, 3) - partially overlapping with the blue square.
The cube's edges are defined by darker shading, suggesting a light source from the top-left.
### Key Observations
The blue "L" shape and the cube create a visual contrast. The cube appears to be "removing" or "carving out" space from the grid, as it overlaps with the blue filled square. The arrangement is symmetrical in a way, with the L shape and the cube balancing each other visually.
### Interpretation
This diagram likely represents a spatial reasoning puzzle or a visual demonstration of geometric concepts. The arrangement could illustrate:
* **Subtraction of volume:** The cube represents a volume being removed from a larger structure (the grid).
* **Shape interaction:** The relationship between the "L" shape and the cube demonstrates how different shapes can interact and overlap in space.
* **Perspective and depth:** The shading on the cube creates a sense of three-dimensionality, illustrating how perspective can be used to represent depth on a two-dimensional surface.
The diagram does not contain any numerical data or specific measurements. It is a purely visual representation of spatial relationships. The arrangement is simple, but it effectively conveys concepts related to geometry, volume, and perspective. It could be used as a teaching aid or as part of a more complex puzzle.
</details>
Figure 7: Cube: Pick up all the blue squares by rolling the cube over them.
<details>
<summary>extracted/5699650/img/puzzles/dominosa.png Details</summary>

### Visual Description
\n
## Grid: Numerical Arrangement
### Overview
The image presents a grid of numerical values arranged in a rectangular format. There are no axes, legends, or explicit labels beyond the numbers themselves. The grid appears to be 8 rows by 7 columns, with each cell containing a single digit.
### Components/Axes
There are no axes or legends. The components are solely the numerical values within the grid.
### Detailed Analysis or Content Details
The grid contains the following numerical values, row by row:
* Row 1: 5, 5, 5, 2, 1, 4, 6
* Row 2: 2, 1, 0, 0, 0, 4, 3
* Row 3: 4, 6, 1, 1, 0, 3, 3
* Row 4: 3, 5, 4, 4, 4, 4, 2
* Row 5: 6, 3, 6, 0, 2, 2, 6
* Row 6: 3, 1, 5, 3, 1, 5, 6
* Row 7: 2, 2, 6, 2, 0, 5, 0
* Row 8: (Incomplete row) 5, 5, 5, 2, 1, 4, 6
### Key Observations
The grid appears to be a partial or incomplete arrangement of numbers. The last row is identical to the first row. There is no immediately obvious pattern or trend in the distribution of numbers. The values range from 0 to 6.
### Interpretation
The image presents a set of numerical data arranged in a grid format. Without additional context, it is difficult to determine the meaning or purpose of this arrangement. It could represent a portion of a larger dataset, a puzzle, or a visual representation of some underlying process. The repetition of the first row as the last row suggests a cyclical or repeating pattern, but this is speculative without further information. The data does not suggest any clear relationships or correlations between the numbers. It is simply a collection of digits arranged in a grid.
</details>
Figure 8: Dominosa: Tile the rectangle with a full set of dominoes.
<details>
<summary>extracted/5699650/img/puzzles/fifteen.png Details</summary>

### Visual Description
\n
## Diagram: Numbered Tile Arrangement
### Overview
The image depicts a grid of numbered tiles, resembling a sliding puzzle or a partially completed arrangement. The grid is not fully populated, with a missing tile in the top-left corner. The tiles are arranged in a 4x4 grid, except for the missing tile, making it a 3x4 arrangement.
### Components/Axes
The diagram consists of 15 numbered tiles, labeled 1 through 15. Tile 1 is missing. The tiles are arranged in a rectangular grid. There are no explicit axes or scales.
### Detailed Analysis or Content Details
The tiles are numbered as follows:
* Row 1: 2, 3, 4 (Tile 1 is missing)
* Row 2: 5, 6, 7, 8
* Row 3: 13, 14, 9, 10
* Row 4: 15, 11, 12
The tiles are arranged in a grid-like structure. The tile numbers do not appear to follow a specific order or pattern beyond sequential numbering.
### Key Observations
The tile numbered '1' is missing from the top-left corner of the grid. The arrangement suggests a puzzle where the goal might be to arrange the tiles in numerical order.
### Interpretation
The image represents a classic sliding puzzle scenario. The missing tile indicates an incomplete state, and the arrangement of the remaining tiles suggests a challenge to restore the numerical order. The puzzle likely involves sliding tiles into the empty space to achieve the desired arrangement. The image does not provide any information about the puzzle's difficulty or the number of moves required to solve it. It is a static representation of a dynamic problem.
</details>
Figure 9: Fifteen: Slide the tiles around to arrange them into order.
<details>
<summary>extracted/5699650/img/puzzles/filling.png Details</summary>

### Visual Description
\n
## Sudoku Puzzle: Partially Filled Grid
### Overview
The image presents a partially filled Sudoku puzzle grid. The grid is a 9x9 matrix, with some cells pre-filled with numbers from 1 to 9. The goal of a Sudoku puzzle is to fill the remaining cells with numbers such that each row, each column, and each of the nine 3x3 subgrids contains all of the digits from 1 to 9.
### Components/Axes
The image consists of a 9x9 grid. The grid is divided into nine 3x3 subgrids. The cells contain numerical values, or are empty (represented by a light gray color). The pre-filled numbers are black.
### Detailed Analysis or Content Details
Here's a reconstruction of the known values within the grid, row by row:
* **Row 1:** Empty, Empty, 3, 1, Empty, 5, 1, 2, Empty
* **Row 2:** 4, Empty, Empty, 6, 1, Empty, Empty, Empty, Empty
* **Row 3:** 4, Empty, Empty, 6, Empty, 7, Empty, Empty, 1
* **Row 4:** 1, Empty, Empty, 6, 1, 7, Empty, 1, 5
* **Row 5:** Empty, 4, 4, 4, 3, 3, 5, 5, Empty
* **Row 6:** 1, 4, 1, 3, 1, 5, 5, Empty, Empty
* **Row 7:** Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty
* **Row 8:** Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty
* **Row 9:** Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty, Empty
### Key Observations
The puzzle is in an early stage of completion. Several rows and columns have only a few numbers filled in. The numbers 1, 3, 4, 5, 6, and 7 are present in the grid. The puzzle appears to be a standard Sudoku puzzle with no apparent modifications to the rules.
### Interpretation
The image represents a logic-based puzzle. The arrangement of the pre-filled numbers provides initial constraints for solving the puzzle. The puzzle requires deductive reasoning to determine the correct placement of the remaining numbers, adhering to the Sudoku rules. The current state suggests a moderate level of difficulty, as there are several empty cells and limited initial clues. The puzzle is designed to test logical thinking and problem-solving skills.
</details>
Figure 10: Filling: Mark every square with the area of its containing region.
<details>
<summary>extracted/5699650/img/puzzles/flip.png Details</summary>

### Visual Description
\n
## Diagram: Grid of Cells with Symbols
### Overview
The image presents a 5x5 grid of cells. Each cell contains a symbol resembling a plus sign with rounded ends, and is filled with one of three shades of gray: white, light gray, or dark gray. Additionally, some cells contain a rotated diamond shape, also in varying shades of gray. The arrangement appears to be patterned, but lacks explicit labels or axes.
### Components/Axes
There are no explicit axes or labels. The components are:
* **Grid:** A 5x5 arrangement of cells.
* **Plus Sign Symbol:** Present in every cell.
* **Diamond Shape:** Present in some cells, rotated approximately 45 degrees.
* **Color/Shading:** Three shades of gray: white, light gray, and dark gray.
### Detailed Analysis or Content Details
The grid can be described cell by cell, starting from the top-left:
* **Row 1:** White, Dark Gray, Dark Gray, Light Gray, Light Gray
* **Row 2:** Light Gray, White, Light Gray, Dark Gray with Diamond (Light Gray), Light Gray
* **Row 3:** White, Light Gray, White, Dark Gray with Diamond (Dark Gray), Dark Gray with Diamond (Light Gray)
* **Row 4:** Light Gray, White, Light Gray with Diamond (Light Gray), White, Light Gray
* **Row 5:** Dark Gray, White, Dark Gray, White, White
The diamond shapes are present in the following cells:
* Row 2, Column 4: Light Gray Diamond
* Row 3, Column 4: Dark Gray Diamond
* Row 3, Column 5: Light Gray Diamond
* Row 4, Column 3: Light Gray Diamond
### Key Observations
* The pattern is not strictly symmetrical, but exhibits a degree of repetition.
* The dark gray cells tend to cluster towards the left and bottom of the grid.
* The diamond shapes are not randomly distributed; they appear in a diagonal pattern.
* The presence of the diamond shape seems to correlate with the cell's shading.
### Interpretation
This diagram appears to be a visual representation of a matrix or a state space. The different shades of gray could represent different values or categories, and the diamond shapes might indicate a specific condition or event. The arrangement suggests a possible relationship between the cell's state (gray level) and the presence or absence of the diamond.
Without further context, it's difficult to determine the exact meaning of the diagram. It could represent:
* **A cellular automaton:** Where the state of each cell evolves based on the states of its neighbors.
* **A game board:** Where the different shades represent different terrain types or player positions.
* **A data visualization:** Where the grid represents a dataset, and the shades represent different values.
* **A logical map:** Where the cells represent states and the diamonds represent transitions.
The diagonal pattern of the diamonds suggests a directional influence or a flow of information. The clustering of dark gray cells could indicate areas of high concentration or activity.
The diagram is abstract and requires additional information to fully understand its purpose and meaning. It is a visual pattern, but lacks explicit data or labels to provide a definitive interpretation.
</details>
Figure 11: Flip: Flip groups of squares to light them all up at once.
<details>
<summary>extracted/5699650/img/puzzles/flood.png Details</summary>

### Visual Description
\n
## Diagram: Block Arrangement
### Overview
The image depicts a 2D arrangement of colored blocks against a blue background. It appears to be a visual puzzle or a representation of a spatial configuration. There are no axes, legends, or numerical data present. The image consists solely of geometric shapes with varying colors.
### Components/Axes
There are no axes or legends. The components are rectangular blocks of the following colors:
* Blue (background)
* Orange
* Red
* Green
* Purple
* Yellow
### Detailed Analysis or Content Details
The arrangement of blocks can be described as follows:
* **Top Row:** Two orange blocks on the right, one green block on the far right.
* **Second Row:** One orange block, followed by two red blocks.
* **Third Row:** One orange block, one purple block, one red block, one green block.
* **Fourth Row:** One green block, one green block, one orange block.
* **Bottom Row:** One green block on the left, one yellow block in the center.
The blocks are of varying sizes, ranging from single-unit squares to larger rectangular shapes. The arrangement does not appear to follow a strict grid pattern, with some blocks overlapping or being positioned adjacent to each other.
### Key Observations
The arrangement is asymmetrical. The blocks are not evenly distributed across the space. The color orange appears multiple times, while purple and yellow appear only once. There is no apparent pattern or order to the arrangement.
### Interpretation
The image does not present any quantifiable data or trends. It is a purely visual representation of a block arrangement. It could represent a puzzle, a game board, or a simplified model of a spatial layout. Without additional context, it is difficult to determine the purpose or meaning of the arrangement. The image is a static representation and does not convey any dynamic information or relationships. It is a visual stimulus without inherent meaning beyond its geometric composition.
</details>
Figure 12: Flood: Turn the grid the same colour in as few flood fills as possible.
<details>
<summary>extracted/5699650/img/puzzles/galaxies.png Details</summary>

### Visual Description
\n
## Diagram: Abstract Grid Arrangement
### Overview
The image presents a square grid composed of smaller rectangular and square regions, some filled with gray shading, others left white. Within these regions are several circles. The arrangement appears to be a puzzle or a visual representation of a spatial problem. There are no explicit labels or axes.
### Components/Axes
The image consists of:
* A square grid, approximately 7x7 cells.
* Rectangular and square regions of varying sizes within the grid.
* Gray shaded regions.
* White regions.
* Circles placed within some of the white regions.
* A black border around the entire grid.
### Detailed Analysis or Content Details
The grid is divided into several regions. The circles are positioned as follows:
* Row 1: One circle in column 5.
* Row 2: One circle in column 3, one in column 6.
* Row 3: One circle in column 2, one in column 4.
* Row 4: One circle in column 1, one in column 5.
* Row 5: One circle in column 3.
* Row 6: One circle in column 4.
* Row 7: One circle in column 2, one in column 6.
The gray shaded regions occupy approximately 40% of the grid area. They are irregularly shaped and do not follow a consistent pattern. The white regions are also irregularly shaped, forming the spaces where the circles are located.
### Key Observations
* The circles are not uniformly distributed across the grid.
* There is no apparent numerical data or scale.
* The arrangement of gray and white regions is complex and non-repetitive.
* The circles are always contained within the white regions.
### Interpretation
The image likely represents a spatial reasoning puzzle or a visual problem. The arrangement of the gray and white regions, along with the placement of the circles, could be part of a larger pattern or a set of constraints. Without additional context, it is difficult to determine the specific objective or meaning of the diagram. It could be a simplified representation of a maze, a network, or a resource allocation problem. The lack of labels or numerical data suggests that the focus is on visual relationships and spatial understanding rather than quantitative analysis. The image is a static representation, and does not suggest any dynamic process or change over time. It is a visual problem that requires spatial reasoning to solve.
</details>
Figure 13: Galaxies: Divide the grid into rotationally symmetric regions each centred on a dot.
<details>
<summary>extracted/5699650/img/puzzles/guess.png Details</summary>

### Visual Description
\n
## Heatmap: Color-Coded Matrix
### Overview
The image presents a heatmap-style matrix of colored circles arranged in a rectangular grid. The matrix appears to represent a relationship between two categorical variables, with color indicating the intensity or value of the relationship. There are no explicit axis labels or numerical scales. A legend is provided at the bottom of the image, indicating the color-to-category mapping.
### Components/Axes
The image consists of:
* **Matrix:** A grid of circles, arranged in approximately 6 rows and 10 columns.
* **Legend:** Located at the bottom of the image, consisting of five colored circles (yellow, green, red, and purple) with corresponding labels.
* **Background:** A uniform gray background.
The matrix does not have explicit axis labels. The rows and columns likely represent categories, but their names are not provided.
### Detailed Analysis or Content Details
The legend maps colors to categories as follows:
* Yellow
* Green
* Red
* Purple
* Gray
The matrix itself contains the following color distribution (approximated based on visual inspection):
* **Row 1:** Red, Yellow, Red, Blue, Orange, Red, Red, Yellow, Yellow, Gray
* **Row 2:** Yellow, Blue, Yellow, Green, Red, Purple, Gray, Gray, Gray, Gray
* **Row 3:** Green, Green, Blue, Yellow, Green, Red, Gray, Gray, Gray, Gray
* **Row 4:** Blue, Orange, Blue, Blue, Orange, Gray, Gray, Gray, Gray, Gray
* **Row 5:** Orange, Yellow, Orange, Gray, Gray, Gray, Gray, Gray, Gray, Gray
* **Row 6:** Purple, Green, Purple, Gray, Gray, Gray, Gray, Gray, Gray, Gray
The rightmost column is entirely gray. The bottom row consists of the legend colors: Yellow, Green, Red, Purple.
### Key Observations
* The matrix is not fully populated, with a significant portion of the cells filled with gray.
* The distribution of colors is uneven, with red and yellow appearing more frequently in the first two rows.
* The last column is entirely gray, suggesting a lack of association or a neutral value for that category.
* The bottom row is the legend, and it is positioned horizontally.
### Interpretation
The image likely represents a contingency table or a correlation matrix, where the color of each cell indicates the strength or presence of a relationship between two categorical variables. The gray color likely represents a missing value, a zero value, or a lack of association.
Without knowing the labels for the rows and columns, it is difficult to draw specific conclusions. However, the pattern suggests that certain combinations of categories are more common or have a stronger relationship than others. For example, the frequent appearance of red and yellow in the first two rows suggests a potential association between those categories and the categories represented by the first two columns.
The heatmap could be used to visualize data from a survey, experiment, or observational study. The colors could represent the frequency of responses, the strength of a correlation, or the probability of an event occurring. The image is a visual representation of data, but lacks the context needed for a full understanding.
</details>
Figure 14: Guess: Guess the hidden combination of colours.
<details>
<summary>extracted/5699650/img/puzzles/inertia.png Details</summary>

### Visual Description
\n
## Diagram: Game Board State
### Overview
The image depicts a game board, likely for a strategy game, showing the arrangement of pieces in a grid. The board is an 8x8 grid with various pieces occupying different squares. There is no explicit labeling of the board or pieces, so the exact game is unknown.
### Components/Axes
The diagram consists of:
* An 8x8 grid.
* Three distinct piece types:
* Light Blue Diamond
* Black Diamond
* Grey Circle
* One highlighted piece:
* Green Circle
### Detailed Analysis or Content Details
The board is filled with pieces in a seemingly random arrangement. Let's describe the piece distribution row by row, starting from the top:
* **Row 1:** Grey Circle, Light Blue Diamond, Light Blue Diamond, Light Blue Diamond, Light Blue Diamond, Grey Circle, Black Diamond, Black Diamond
* **Row 2:** Black Diamond, Empty, Light Blue Diamond, Empty, Empty, Grey Circle, Empty, Green Circle
* **Row 3:** Empty, Grey Circle, Empty, Light Blue Diamond, Empty, Empty, Empty, Empty
* **Row 4:** Empty, Empty, Black Diamond, Light Blue Diamond, Black Diamond, Empty, Grey Circle, Empty
* **Row 5:** Grey Circle, Empty, Empty, Empty, Empty, Black Diamond, Empty, Grey Circle
* **Row 6:** Grey Circle, Black Diamond, Black Diamond, Black Diamond, Empty, Empty, Black Diamond, Black Diamond
* **Row 7:** Black Diamond, Grey Circle, Empty, Light Blue Diamond, Empty, Empty, Empty, Empty
* **Row 8:** Black Diamond, Black Diamond, Black Diamond, Black Diamond, Grey Circle, Grey Circle, Black Diamond, Empty
There are 20 Black Diamond pieces, 8 Light Blue Diamond pieces, and 12 Grey Circle pieces. One Green Circle piece is present.
### Key Observations
* The Green Circle piece is located in the top-right quadrant of the board.
* The Black Diamond pieces are the most numerous, concentrated in the bottom half of the board.
* The Light Blue Diamond pieces are clustered in the top rows.
* The Grey Circle pieces are distributed more evenly across the board.
* There are several empty spaces on the board.
### Interpretation
Without knowing the rules of the game, it's difficult to interpret the significance of this board state. However, we can make some observations:
* The distribution of pieces suggests a dynamic game state, potentially mid-game.
* The concentration of Black Diamond pieces could indicate a strategic advantage for one player.
* The Green Circle piece might represent a key piece or a player's current turn.
* The empty spaces offer opportunities for movement and strategic placement of pieces.
The arrangement of pieces suggests a game that involves strategic placement and potentially capturing or maneuvering pieces on the board. The lack of any explicit labels or rules makes it impossible to determine the game's objective or the specific roles of each piece type. The board state appears to be a snapshot of a game in progress, with potential for further development and strategic play.
</details>
Figure 15: Inertia: Collect all the gems without running into any of the mines.
<details>
<summary>extracted/5699650/img/puzzles/keen.png Details</summary>

### Visual Description
\n
## Grid: Arithmetic Puzzle
### Overview
The image presents a 5x5 grid filled with numbers and arithmetic expressions. The grid appears to be a puzzle where the goal is likely to fill in the missing values based on the given operations and numbers. The numbers and operations are color-coded: blue and green.
### Components/Axes
The grid has 5 rows and 5 columns. The cells contain either a single number or an arithmetic expression in the form "X operation Y = result". The operations used are addition (+), subtraction (-), and multiplication (x).
### Detailed Analysis or Content Details
Here's a breakdown of the grid's content, row by row:
* **Row 1:**
* "5+" (Blue)
* "15x = 35" (Blue)
* "7+ = 25" (Blue)
* "10+" (Blue)
* **Row 2:**
* "2- = 13" (Blue)
* "13" (Blue)
* "35" (Blue)
* "25" (Blue)
* **Row 3:**
* "2-" (Green)
* "2+ = 2" (Green)
* "4" (Green)
* "3- = 1" (Green)
* **Row 4:**
* "40x = 5" (Green)
* "5" (Green)
* "2x = 2" (Green)
* "4" (Green)
* **Row 5:**
* "2" (Green)
* "4" (Green)
* "1" (Green)
* "2- = 3" (Green)
* "5" (Green)
### Key Observations
* The grid is divided into two color-coded sets of equations/numbers: blue and green.
* The blue entries are incomplete equations, requiring a missing operand to be determined.
* The green entries are either single numbers or complete equations.
* Some equations have already been solved (e.g., "15x = 35").
* The puzzle seems to involve solving for missing values in arithmetic expressions.
### Interpretation
The image presents an arithmetic puzzle. The puzzle likely requires the user to solve for the missing numbers in the incomplete equations. The color coding may indicate different levels of difficulty or different sets of equations to solve. The puzzle tests basic arithmetic skills (addition, subtraction, multiplication) and problem-solving abilities. The arrangement of the numbers and equations suggests a potential pattern or relationship between the values, which could be used to deduce the missing numbers. The puzzle is designed to be solved logically, using the given information and arithmetic rules.
</details>
Figure 16: Keen: Complete the latin square in accordance with the arithmetic clues.
<details>
<summary>extracted/5699650/img/puzzles/lightup.png Details</summary>

### Visual Description
\n
## Grid: Cellular Automata Configuration
### Overview
The image depicts a 7x7 grid representing a configuration of a cellular automaton. The grid cells are colored in yellow, gray, and black, and contain either a number (0, 1, or 3), a white circle, or a black square. This appears to be a snapshot of a state in a simulation.
### Components/Axes
The image does not have explicit axes or a legend in the traditional sense. However, the grid itself can be considered a two-dimensional coordinate system. The elements within the grid are:
* **Yellow Cells:** Represent the background or default state.
* **Gray Cells:** Represent an intermediate state.
* **Black Cells:** Represent a distinct state.
* **White Circles:** Represent a specific type of cell, possibly with a certain value or property.
* **Black Squares:** Represent another specific type of cell, possibly with a different value or property.
* **Numbers (0, 1, 3):** Represent numerical values associated with certain cells.
### Detailed Analysis or Content Details
The grid can be described cell by cell, row by row (starting from the top-left):
* **Row 1:** White Circle, 3, White Circle, Yellow, Yellow, Yellow, Yellow
* **Row 2:** Yellow, White Circle, Black, 0, Yellow, Yellow, White Circle
* **Row 3:** Yellow, 0, Yellow, Black Square, Yellow, Yellow, Yellow
* **Row 4:** White Circle, Black, Yellow, Yellow, 1, Yellow, Yellow
* **Row 5:** Yellow, Yellow, Yellow, 1, Black Square, Yellow, Yellow
* **Row 6:** Black Square, 0, 1, Black Square, White Circle, Black Square, Yellow
* **Row 7:** Yellow, Black Square, Black Square, 0, Black Square, Black Square, Yellow
The numerical values are:
* 3 appears once.
* 0 appears four times.
* 1 appears twice.
The white circles appear 5 times.
The black squares appear 7 times.
### Key Observations
* The distribution of elements is not uniform. Black cells and squares are concentrated in the lower-left portion of the grid.
* The presence of numerical values suggests that these cells might represent some kind of state or counter.
* The combination of different elements (numbers, circles, squares, colors) indicates a complex system with multiple states and properties.
### Interpretation
This image likely represents a snapshot of a cellular automaton, a discrete model studied in computer science and mathematics. The different colors and symbols represent different cell states, and the numbers might represent some kind of counter or value associated with those states. The arrangement of these states could be the result of a set of rules governing how cells change their state based on the states of their neighbors.
The concentration of black cells and squares in the lower-left corner could indicate a pattern or a stable configuration. The presence of numbers suggests that the automaton might be tracking some kind of quantity or property over time. Without knowing the specific rules governing the automaton, it is difficult to determine the exact meaning of this configuration. However, it is clear that the image represents a complex system with a rich set of states and interactions. The image is a static representation of a dynamic system.
</details>
Figure 17: Light Up: Place bulbs to light up all the squares.
<details>
<summary>extracted/5699650/img/puzzles/loopy.png Details</summary>

### Visual Description
\n
## Diagram: Maze with Numerical Labels
### Overview
The image depicts a maze-like structure drawn on a grid. The maze is formed by thick black lines, and certain grid cells contain numerical labels. The remaining grid cells are highlighted in yellow. The maze appears to have a single entry point (labeled '0') and a possible exit point (labeled '1').
### Components/Axes
The diagram consists of:
* A grid of approximately 7x7 cells.
* Thick black lines defining the maze pathways.
* Numerical labels within some grid cells: 0, 1, 2, 3.
* Yellow highlighting of cells not part of the maze pathways.
### Detailed Analysis or Content Details
The numerical labels and their positions within the grid are as follows:
* **0:** Located in the top-left corner of the maze.
* **2:** Appears multiple times, positioned at (1,0), (1,2), (1,3), (1,4), (3,1), (3,2), (4,2), (5,2), (5,3).
* **3:** Located at (2,0), (4,0), (6,1).
* **1:** Located at (6,4) and (6,5).
The maze pathways connect these labeled cells. The yellow highlighted cells represent the areas blocked off by the maze walls.
### Key Observations
* The maze has a clear starting point (0) and a potential ending point (1).
* The number '2' is the most frequently occurring label within the maze.
* The numbers seem to be arbitrarily assigned to cells within the maze, and do not appear to represent coordinates or distances.
* The maze is not symmetrical.
### Interpretation
The diagram likely represents a puzzle or a pathfinding problem. The numbers within the cells could represent costs, weights, or identifiers associated with each location in the maze. The task could be to find the shortest path from '0' to '1', potentially considering the numerical values as part of the path cost. Without further context, the meaning of the numbers remains ambiguous. The maze itself is relatively simple, suggesting it might be designed for introductory problem-solving exercises. The yellow highlighting serves to visually separate the navigable path from the blocked areas. The diagram does not provide any information about the rules or constraints of the maze, so it is difficult to determine the intended solution.
</details>
Figure 18: Loopy: Draw a single closed loop, given clues about number of adjacent edges.
<details>
<summary>extracted/5699650/img/puzzles/magnets.png Details</summary>

### Visual Description
\n
## Grid Puzzle: Numerical and Symbolic Arrangement
### Overview
The image presents a 5x4 grid filled with numbers, plus and minus symbols, question marks, and green 'X' symbols. The grid appears to represent a puzzle or a mathematical problem, with the goal likely being to determine the values represented by the question marks. The grid is framed by numerical labels along the top and right edges, and along the bottom and left edges.
### Components/Axes
* **Horizontal Axis (Top):** Labeled with the numbers 2, 2, and 1.
* **Vertical Axis (Left):** Labeled with the numbers 3, 2, and 2. The top-left cell is labeled with "+".
* **Horizontal Axis (Bottom):** Labeled with the numbers 2, 2, and 1.
* **Vertical Axis (Right):** Labeled with the numbers 2, 0, and "-".
* **Symbols:**
* "+" (Red)
* "-" (Black)
* "?" (Question Mark)
* "X" (Green)
### Detailed Analysis or Content Details
The grid contains the following elements:
* **Row 1, Column 1:** "+" (Red)
* **Row 1, Column 2:** "+" (Red)
* **Row 1, Column 3:** "?" (Question Mark)
* **Row 1, Column 4:** "?" (Question Mark)
* **Row 2, Column 1:** "-" (Black)
* **Row 2, Column 2:** "-" (Black)
* **Row 2, Column 3:** "+" (Red)
* **Row 2, Column 4:** "-" (Black)
* **Row 3, Column 1:** "+" (Red)
* **Row 3, Column 2:** "-" (Black)
* **Row 3, Column 3:** "+" (Red)
* **Row 3, Column 4:** "-" (Black)
* **Row 4, Column 1:** "X" (Green)
* **Row 4, Column 2:** "X" (Green)
* **Row 4, Column 3:** "+" (Red)
* **Row 4, Column 4:** "X" (Green)
* **Row 5, Column 1:** "-" (Black)
* **Row 5, Column 2:** "+" (Red)
* **Row 5, Column 3:** "-" (Black)
* **Row 5, Column 4:** "X" (Green)
### Key Observations
The arrangement of symbols and numbers suggests a potential mathematical relationship or pattern. The question marks indicate missing values that need to be determined based on the surrounding elements and the numerical labels on the axes. The green "X" symbols appear in a diagonal pattern.
### Interpretation
The image presents a puzzle that likely requires logical deduction and mathematical reasoning to solve. The numerical labels on the axes could represent coordinates or factors in a calculation. The plus and minus signs suggest addition or subtraction operations. The question marks represent unknown values that need to be determined based on the given information. The green "X" symbols might represent correct solutions or specific conditions within the puzzle. Without further context or rules, it is difficult to determine the exact solution or the underlying logic of the puzzle. The puzzle appears to be a logic grid, where the goal is to fill in the missing values based on the constraints provided by the surrounding elements and the numerical labels. The arrangement of symbols and numbers suggests a potential mathematical relationship or pattern.
</details>
Figure 19: Magnets: Place magnets to satisfy the clues and avoid like poles touching.
<details>
<summary>extracted/5699650/img/puzzles/map.png Details</summary>

### Visual Description
\n
## Abstract Geometric Composition: Color-Blocked Shapes
### Overview
The image presents an abstract composition consisting of irregularly shaped, color-blocked polygons arranged to fill a square frame. There are no axes, legends, or numerical data present. The image is purely visual and does not contain factual or quantifiable information. It appears to be a design element or a visual pattern.
### Components/Axes
There are no axes or legends. The image is composed of the following colors:
* Light Green
* Dark Green
* Yellow
* Brown (various shades)
* White
The shapes are defined by black outlines.
### Detailed Analysis or Content Details
The composition is densely packed with shapes of varying sizes. The shapes interlock and overlap, creating a complex visual texture. There is no apparent hierarchy or focal point. The shapes are not uniform; some are angular, while others are more rounded. The white shapes are generally larger and more angular than the colored shapes. The brown shapes are the most numerous.
### Key Observations
The image lacks any clear data or trends. It is a purely aesthetic arrangement of colors and shapes. The use of contrasting colors (e.g., green and brown, yellow and brown) creates visual interest. The white shapes provide negative space and help to define the boundaries of the colored shapes.
### Interpretation
The image appears to be an abstract representation of a fragmented or partitioned space. The irregular shapes and varied colors could symbolize diversity, complexity, or disarray. The lack of a clear structure or hierarchy suggests a rejection of traditional order. The image could be interpreted as a visual metaphor for a chaotic system or a fragmented identity. Without additional context, it is difficult to determine the artist's intent or the specific meaning of the composition. It is likely a design element intended for aesthetic purposes rather than conveying specific information. The image does not provide any data or insights beyond its visual characteristics.
</details>
Figure 20: Map: Colour the map so that adjacent regions are never the same colour.
<details>
<summary>extracted/5699650/img/puzzles/mines.png Details</summary>

### Visual Description
\n
## Grid: Numbered Cells
### Overview
The image displays a grid of cells, most of which are empty (white). A subset of cells contain numerical values, colored either green, red, or black. The grid appears to be incomplete, with many empty cells suggesting it represents a partially filled or in-progress state of some larger structure.
### Components/Axes
There are no explicit axes or legends. The grid itself forms the primary structure. The data consists of numerical values ranging from 1 to 7, displayed in three colors: green, red, and black. The grid is approximately 9x9, though the bottom rows are incomplete.
### Detailed Analysis or Content Details
The numerical values and their positions within the grid are as follows (reading row by row, left to right):
* **Row 1:** 1 (green), 2 (green), 2 (green), 3 (red), 1 (red), 1 (red), 1 (red), 1 (red)
* **Row 2:** 2 (red), 3 (red), 1 (black), 4 (red), 5 (red), 4 (red)
* **Row 3:** 1 (black), 1 (black), 7 (black), 1 (red), 5 (red), 2 (green), 3 (green), 2 (green), 1 (green)
* **Row 4:** 1 (red), 1 (red), 1 (red), 5 (red), 1 (black), 1 (black), 1 (black), 1 (black)
* **Row 5:** (empty), (empty), 1 (red), 4 (green), 2 (green)
* **Row 6:** (empty), (empty), 1 (red), 1 (red), 2 (green), 2 (green), 1 (green)
* **Row 7:** (empty), (empty), 3 (red), 3 (red), 1 (black), 4 (red), 1 (red)
* **Row 8:** (empty), (empty), 3 (green), 2 (green), 4 (red), 1 (red), 1 (red)
The remaining cells are empty (white).
### Key Observations
* The numbers 1, 2, 3, 4, 5, and 7 are present in the grid.
* The color distribution is uneven. Red appears to be the most frequent color, followed by green and then black.
* The numbers are not arranged in any immediately obvious sequential or patterned manner.
* The grid is not fully populated, with a significant number of empty cells.
* The numbers seem to be randomly distributed, with no clear trend or correlation.
### Interpretation
The image likely represents a partially completed puzzle or game board. The numbers and colors could represent different categories or values within the game. The incomplete state suggests the game is in progress. Without knowing the rules of the game, it is difficult to determine the significance of the arrangement of numbers and colors. It could be a Sudoku-like puzzle, a Minesweeper-like game, or a custom game with unique rules. The lack of axes or a legend implies that the meaning of the numbers and colors is inherent to the game itself and not externally defined. The distribution of colors and numbers does not reveal any obvious strategy or pattern, suggesting the game may involve randomness or complex interactions.
</details>
Figure 21: Mines: Find all the mines without treading on any of them.
<details>
<summary>extracted/5699650/img/puzzles/mosaic.png Details</summary>

### Visual Description
\n
## Sudoku Puzzle: Partially Filled Grid
### Overview
The image presents a partially filled Sudoku puzzle grid. The grid is a 9x9 arrangement of cells, with some cells pre-filled with numbers from 0 to 6. The goal of a Sudoku puzzle is to fill the remaining cells with numbers from 1 to 9, ensuring that each number appears only once in each row, column, and 3x3 subgrid.
### Components/Axes
The image consists of a 9x9 grid. The grid is divided into nine 3x3 subgrids. The cells contain numerical values ranging from 0 to 6. There are no explicit axes or legends.
### Detailed Analysis or Content Details
The following numbers are pre-filled in the grid, listed by row and column (row numbering starts from 1 at the top, column numbering starts from 1 at the left):
* Row 1, Column 2: 2
* Row 1, Column 6: 3
* Row 2, Column 1: 4
* Row 2, Column 3: 2
* Row 2, Column 5: 5
* Row 3, Column 1: 6
* Row 3, Column 5: 3
* Row 4, Column 1: 5
* Row 4, Column 2: 5
* Row 5, Column 2: 4
* Row 5, Column 3: 2
* Row 5, Column 4: 4
* Row 5, Column 5: 4
* Row 6, Column 3: 0
* Row 6, Column 7: 3
* Row 7, Column 1: 6
* Row 7, Column 2: 4
* Row 8, Column 2: 4
* Row 8, Column 3: 2
* Row 8, Column 4: 3
* Row 8, Column 6: 2
* Row 9, Column 4: 4
### Key Observations
The puzzle is in an intermediate state, with a significant number of cells still empty. The presence of '0' in one of the cells is unusual for a standard Sudoku puzzle, which typically uses numbers 1-9. This might indicate a variation of the puzzle or a placeholder value.
### Interpretation
The image represents a logic-based puzzle. The pre-filled numbers provide constraints for solving the puzzle. The '0' value is an anomaly that needs to be considered when attempting to solve the puzzle. The puzzle requires deductive reasoning to determine the correct placement of the remaining numbers, adhering to the Sudoku rules. The puzzle's difficulty depends on the initial configuration of numbers and the complexity of the logical deductions required to solve it.
</details>
Figure 22: Mosaic: Fill in the grid given clues about number of nearby black squares.
<details>
<summary>extracted/5699650/img/puzzles/net.png Details</summary>

### Visual Description
\n
## Diagram: Process Flow or System Architecture
### Overview
The image depicts a diagram representing a process flow or system architecture. It consists of interconnected rectangular blocks, some colored blue, some cyan, and one black, connected by lines indicating the flow of information or materials. The background is a grid pattern. There are no explicit labels or axis titles.
### Components/Axes
The diagram contains the following components:
* **Blue Blocks (x4):** Located in the top portion of the diagram.
* **Cyan Blocks (x4):** Located in the bottom portion of the diagram.
* **Black Block (x1):** Centrally located, connecting the upper and lower sections.
* **Black Lines:** Connecting the blue blocks and the black block.
* **Cyan Lines:** Connecting the cyan blocks and the black block.
* **Grid Background:** A uniform grid pattern covering the entire diagram.
### Detailed Analysis or Content Details
The diagram can be divided into two main sections: an upper section with blue blocks and a lower section with cyan blocks. The black block acts as a central connector between these two sections.
* **Upper Section:** Three blue blocks are arranged horizontally at the top. These are connected by a black line that forms a "U" shape, connecting the leftmost and rightmost blocks through a central block.
* **Lower Section:** Four cyan blocks are arranged in a rectangular pattern. These are connected by a cyan line that forms a closed loop, with a vertical line extending upwards to connect to the black block.
* **Central Connection:** The black block is positioned centrally and is connected to the upper section via a black line descending from the central blue block. It is also connected to the lower section via a vertical cyan line.
There are no numerical values or specific data points present in the diagram. The diagram is purely representational.
### Key Observations
The diagram suggests a process where information or materials flow from the blue blocks, through the black block, and then to the cyan blocks. The closed loop in the lower section suggests a feedback mechanism or a cyclical process. The black block appears to be a critical component, acting as a central processing unit or a control point.
### Interpretation
The diagram likely represents a simplified model of a system or process. The blue blocks could represent input sources, the cyan blocks could represent output destinations, and the black block could represent a processing unit. The lines indicate the flow of information or materials between these components.
The closed loop in the lower section suggests a feedback mechanism, where the output of the cyan blocks influences the input to the black block, potentially adjusting the process. The diagram's simplicity suggests it is intended to convey a high-level overview of the system, rather than a detailed technical specification.
Without additional context, it is difficult to determine the specific nature of the process or system being represented. However, the diagram provides a clear visual representation of the relationships between the different components and the flow of information or materials. The diagram is a visual metaphor, and its meaning is dependent on the context in which it is used. It could represent anything from a manufacturing process to a software architecture.
</details>
Figure 23: Net: Rotate each tile to reassemble the network.
<details>
<summary>extracted/5699650/img/puzzles/netslide.png Details</summary>

### Visual Description
\n
## Diagram: Logic Puzzle Grid
### Overview
The image depicts a logic puzzle grid, likely a type of flow puzzle or maze. The grid is a square, bordered by a red rectangle. There are input points on the left side, output points on the right side, and internal components that appear to direct flow. The puzzle appears to involve connecting the input points to the output points using the available pathways.
### Components/Axes
The diagram consists of:
* **Grid:** A square grid composed of smaller square cells.
* **Input Points:** Three blue square components located on the left side of the grid.
* **Output Points:** Three cyan (light blue) square components located on the right side of the grid.
* **Flow Controller:** A black square component located in the center of the grid.
* **Pathways:** Black lines connecting the input, output, and flow controller components.
* **Arrows:** Grey arrows pointing inwards towards the grid, positioned around the perimeter.
### Detailed Analysis or Content Details
The grid is approximately 7x7 cells. The input points are vertically aligned on the left side. The output points are vertically aligned on the right side. The flow controller is positioned roughly in the center of the grid.
* **Input 1 (Top):** Connected to the flow controller via a vertical pathway.
* **Input 2 (Middle):** Connected to the flow controller via a vertical pathway.
* **Input 3 (Bottom):** Connected to the flow controller via a vertical pathway.
* **Flow Controller:** Connected to the top and middle output points via pathways. The bottom output point is not directly connected to the flow controller.
* **Output 1 (Top):** Connected to the flow controller.
* **Output 2 (Middle):** Connected to the flow controller.
* **Output 3 (Bottom):** Not directly connected to the flow controller.
The arrows around the perimeter suggest inputs or constraints to the puzzle.
### Key Observations
The puzzle appears to require routing the inputs to the outputs using the available pathways and the flow controller. The bottom output point is isolated, suggesting a unique solution path or constraint. The flow controller acts as a central hub for directing the flow.
### Interpretation
This diagram represents a logic puzzle where the goal is to establish a valid flow from the input points to the output points, adhering to the constraints imposed by the grid layout and the flow controller. The puzzle likely requires deductive reasoning to determine the correct pathway configuration. The isolated bottom output point suggests a more complex routing requirement. The arrows around the perimeter could represent additional constraints or input conditions that must be satisfied. The puzzle is likely designed to test spatial reasoning and problem-solving skills. The diagram does not contain any numerical data or quantitative information; it is purely a visual representation of a logical problem.
</details>
Figure 24: Netslide: Slide a row at a time to reassemble the network.
<details>
<summary>extracted/5699650/img/puzzles/palisade.png Details</summary>

### Visual Description
\n
## Diagram: Grid with Numerical Labels
### Overview
The image depicts a grid composed of rectangular cells, with numerical labels placed within some of the cells. Yellow lines segment the grid into irregular regions. The grid appears to be a 5x5 arrangement, though not all cells are fully defined by the black border.
### Components/Axes
The diagram consists of:
* A 5x5 grid structure defined by a black border.
* Numerical labels: 1, 2, and 3.
* Yellow lines that divide the grid into several irregular regions.
* Small black dots scattered within some of the grid cells.
### Detailed Analysis or Content Details
The numerical labels and their positions within the grid are as follows:
* Top-left cell: "2"
* Second cell in the first row: "2"
* Third cell in the first row: "3"
* First cell in the second row: "3"
* Second cell in the second row: "3"
* Third cell in the second row: "2"
* First cell in the third row: "1"
* Last cell in the third row: "2"
The yellow lines are positioned as follows:
* A vertical line separating the first and second columns.
* A vertical line separating the third and fourth columns.
* A horizontal line separating the first and second rows.
* A horizontal line separating the second and third rows.
* A horizontal line separating the fourth and fifth rows.
The black dots are scattered throughout the grid, with no apparent pattern.
### Key Observations
The numbers 1, 2, and 3 are distributed across the grid. The yellow lines create a partitioning of the grid, potentially defining regions with specific properties. The dots do not appear to be associated with any specific number or region.
### Interpretation
The diagram likely represents a spatial arrangement or a puzzle-like structure. The numbers could represent values, categories, or identifiers associated with each region. The yellow lines might define boundaries or constraints within the grid. Without further context, it's difficult to determine the precise meaning or purpose of the diagram. It could be a simplified representation of a map, a game board, or a data visualization. The dots could be markers or indicators of some kind. The arrangement suggests a problem-solving or optimization task, where the goal might be to arrange or manipulate the numbers within the defined regions.
</details>
Figure 25: Palisade: Divide the grid into equal-sized areas in accordance with the clues.
<details>
<summary>extracted/5699650/img/puzzles/pattern.png Details</summary>

### Visual Description
\n
## Heatmap: Unlabeled Data Grid
### Overview
The image presents a grayscale heatmap with numerical labels along the top and left edges. The heatmap consists of a grid of cells, each shaded with a different intensity of gray, representing a value. There is no explicit legend provided, so the mapping between grayscale intensity and numerical value is inferred from the labels.
### Components/Axes
* **X-axis (Top):** Labeled with the numbers 2, 3, 2, 4, 2, 3, 3.
* **Y-axis (Left):** Labeled with the numbers 5, 1, 4, 5, 5, 6, 2, 3, 3, 1, 1, 4.
* **Grid:** A 12x7 grid of cells, each with a grayscale value.
### Detailed Analysis
The heatmap data can be represented as a matrix, where the row index corresponds to the Y-axis label and the column index corresponds to the X-axis label. The grayscale intensity of each cell represents a value. I will describe the grayscale intensity qualitatively, using "light" (closest to white), "medium" (mid-gray), and "dark" (closest to black).
Here's a breakdown of the grayscale values, row by row, column by column:
* **Row 1 (5):** Light, Light, Medium, Dark, Medium, Light, Light
* **Row 2 (1):** Light, Dark, Light, Light, Medium, Medium, Light
* **Row 3 (4):** Medium, Light, Light, Medium, Dark, Light, Light
* **Row 4 (5):** Medium, Medium, Dark, Medium, Medium, Dark, Medium
* **Row 5 (5):** Dark, Medium, Medium, Light, Medium, Medium, Medium
* **Row 6 (6):** Light, Dark, Medium, Medium, Light, Light, Light
* **Row 7 (2):** Light, Light, Dark, Medium, Medium, Light, Light
* **Row 8 (3):** Light, Medium, Medium, Light, Dark, Medium, Medium
* **Row 9 (3):** Medium, Light, Dark, Medium, Light, Light, Light
* **Row 10 (1):** Light, Light, Light, Medium, Medium, Light, Light
* **Row 11 (1):** Light, Light, Medium, Light, Light, Light, Light
* **Row 12 (4):** Light, Medium, Dark, Dark, Medium, Light, Light
### Key Observations
* The heatmap does not exhibit a clear, simple trend. The grayscale values appear somewhat randomly distributed.
* The top-right corner of the heatmap (columns 5, 6, and 7) generally contains lighter shades of gray.
* The bottom-left corner (rows 10, 11, and 12, columns 1, 2, and 3) also tends to have lighter shades.
* There are several isolated dark cells scattered throughout the grid.
### Interpretation
The image presents a data grid without a clear context or explanation. Without knowing what the X and Y axes represent, and without a legend to interpret the grayscale values, it's difficult to draw meaningful conclusions. It could represent a correlation matrix, a spatial distribution of values, or any other type of two-dimensional data. The lack of labels and a legend makes it impossible to determine the significance of the observed patterns. The data appears to be discrete, as the grayscale values seem to fall into a limited number of categories (light, medium, dark). The distribution of these categories doesn't suggest any obvious relationships or trends. It is possible that the data is noisy or represents a complex system with many interacting variables.
</details>
Figure 26: Pattern: Fill in the pattern in the grid, given only the lengths of runs of black squares.
<details>
<summary>extracted/5699650/img/puzzles/pearl.png Details</summary>

### Visual Description
\n
## Diagram: Grid-Based Pattern
### Overview
The image depicts a grid-based pattern consisting of black lines connecting black circles and white circles scattered across a gray grid. The grid appears to be approximately 8x8. There are no explicit labels or axes. The diagram appears to be a visual puzzle or a representation of a network.
### Components/Axes
The diagram consists of:
* **Grid:** A gray grid providing the background structure. The grid lines are faint but visible.
* **Black Lines:** Thick black lines connecting black circles. These lines form several distinct shapes.
* **Black Circles:** Solid black circles, appearing as nodes or endpoints of the black lines. There are 5 black circles.
* **White Circles:** Hollow white circles, scattered throughout the grid. There are 5 white circles.
### Detailed Analysis or Content Details
The black lines and circles form the following shapes:
1. **Top-Left:** A 'U' shape with a horizontal line extending from the right side of the 'U'. The 'U' is open at the bottom.
2. **Center-Left:** A short horizontal line connecting two black circles.
3. **Top-Right:** A zig-zag pattern, resembling a simplified 'Z' or 'N' shape.
4. **Bottom-Right:** A short zig-zag pattern, similar to the top-right, but shorter.
The white circles are positioned as follows:
1. **Center-Top:** Approximately in the center of the top half of the grid.
2. **Center-Left:** Slightly below the center, to the left of the grid.
3. **Bottom-Center:** Approximately in the center of the bottom half of the grid.
4. **Bottom-Right:** Near the bottom-right corner of the grid.
5. **Top-Right:** Near the top-right corner of the grid.
There is no numerical data present in the image. The arrangement of the shapes and circles is the primary information.
### Key Observations
* The black lines create closed or partially closed shapes.
* The white circles are not connected by any lines.
* The distribution of white circles appears somewhat random, but they are generally spaced out across the grid.
* The black circles are always endpoints of the black lines.
### Interpretation
The diagram likely represents a puzzle or a network with specific constraints. The black lines could represent paths or connections, while the white circles might represent destinations or obstacles. The arrangement suggests a problem where one needs to connect the black circles using the provided lines, potentially avoiding or interacting with the white circles. The lack of labels or context makes it difficult to determine the exact nature of the puzzle or network. It could be a simplified representation of a maze, a circuit diagram, or a game board. The image is a visual representation of relationships and spatial arrangement, rather than a quantitative dataset. It is a problem-solving visual.
</details>
Figure 27: Pearl: Draw a single closed loop, given clues about corner and straight squares.
<details>
<summary>extracted/5699650/img/puzzles/pegs.png Details</summary>

### Visual Description
\n
## Diagram: Dot Grid
### Overview
The image depicts a grid of dots arranged in a cross shape. The majority of the dots are blue, while a smaller number are gray. There are no explicit labels, axes, or legends. The arrangement appears symmetrical.
### Components/Axes
There are no axes or legends present. The diagram consists of a 5x5 grid of dots, forming a plus sign shape. The central dot is not present.
### Detailed Analysis or Content Details
The grid is composed of 24 dots.
- 16 dots are blue. They are arranged in a symmetrical pattern around the center.
- 8 dots are gray. These are positioned at the four extreme ends of the cross. Specifically:
- Top: 1 gray dot
- Bottom: 1 gray dot
- Left: 3 gray dots
- Right: 3 gray dots
### Key Observations
The diagram exhibits a clear distinction between blue and gray dots. The gray dots are concentrated at the periphery of the cross, while the blue dots dominate the central region. The number of gray dots is significantly lower than the number of blue dots.
### Interpretation
This diagram appears to be a visual representation of a binary state or a simple pattern. The blue dots could represent a "positive" or "active" state, while the gray dots represent a "negative" or "inactive" state. The concentration of blue dots in the center suggests a focus or emphasis on that area. The arrangement could symbolize a network, a connection point, or a distribution pattern. Without further context, the specific meaning remains ambiguous. The diagram does not provide any quantitative data, only a qualitative distribution of two colors. It is a symbolic representation rather than a data visualization.
</details>
Figure 28: Pegs: Jump pegs over each other to remove all but one.
<details>
<summary>extracted/5699650/img/puzzles/range.png Details</summary>

### Visual Description
\n
## Puzzle: Sudoku Grid
### Overview
The image presents a partially filled Sudoku grid. The grid is a 9x9 matrix with some cells pre-filled with numbers, and other cells marked with dots indicating empty spaces. Several cells are filled with black squares, presumably representing blocked cells.
### Components/Axes
The grid is structured as a 9x9 matrix. The rows and columns are implicitly numbered from 1 to 9. The cells contain either a number (1-13), a dot (representing an empty cell), or a black square (representing a blocked cell).
### Detailed Analysis or Content Details
Here's a reconstruction of the grid's content, row by row:
* **Row 1:** `.`, `.`, `.`, `7`, `.`, `.`, `.`, `.`, `.`
* **Row 2:** `3`, `.`, `.`, `.`, `.`, `.`, `.`, `.`, `8`
* **Row 3:** `.`, `.`, `.`, `.`, `█`, `.`, `5`, `.`, `.` (█ represents a black square)
* **Row 4:** `.`, `.`, `7`, `.`, `.`, `7`, `█`, `.`, `.`
* **Row 5:** `.`, `13`, `.`, `.`, `.`, `.`, `.`, `.`, `.`
* **Row 6:** `4`, `.`, `.`, `█`, `.`, `.`, `.`, `.`, `8`
* **Row 7:** `.`, `.`, `.`, `4`, `.`, `.`, `.`, `.`, `.`
* **Row 8:** `.`, `.`, `.`, `.`, `.`, `.`, `.`, `.`, `.`
* **Row 9:** `.`, `.`, `.`, `.`, `.`, `.`, `.`, `.`, `.`
### Key Observations
The grid contains numbers ranging from 3 to 13. The presence of '13' is unusual for a standard Sudoku puzzle, which typically uses numbers 1-9. The black squares indicate blocked cells, which are not part of the puzzle's solution. The grid is sparsely populated, with many empty cells.
### Interpretation
This image depicts a Sudoku-like puzzle. However, the inclusion of the number '13' suggests it is not a standard 9x9 Sudoku. It could be a variation of the puzzle with a different rule set or a puzzle designed for a different number range. The goal of the puzzle is likely to fill the empty cells with numbers such that each row, column, and 3x3 subgrid contains all the numbers from 1 to 13 (or whatever the valid range is) without repetition. The black squares are constraints that cannot be filled. The puzzle is in an initial state, awaiting a solution.
</details>
Figure 29: Range: Place black squares to limit the visible distance from each numbered cell.
<details>
<summary>extracted/5699650/img/puzzles/rect.png Details</summary>

### Visual Description
\n
## Puzzle: Partially Filled Grid
### Overview
The image presents a partially filled grid, resembling a logic puzzle or a simplified Sudoku-like structure. The grid is composed of cells, some of which contain numerical values. The grid is outlined with a thick black border.
### Components/Axes
The image consists of a grid with dimensions approximately 7x7 cells. The cells are arranged in rows and columns. Some cells are filled with the following numbers: 2, 3, 4, and 8. The grid does not have explicit axis labels or a legend.
### Detailed Analysis or Content Details
The grid contains the following numerical values at these approximate locations:
* Row 1, Column 3: 3
* Row 2, Column 3: 2
* Row 2, Column 4: 3
* Row 2, Column 5: 2
* Row 3, Column 1: 4
* Row 3, Column 2: 8
* Row 5, Column 3: 4
* Row 5, Column 4: 2
* Row 6, Column 1: 2
* Row 6, Column 2: 3
* Row 7, Column 1: 3
* Row 7, Column 3: 3
* Row 1, Column 6: 2
* Row 5, Column 6: 3
* Row 6, Column 5: 3
The remaining cells are empty (represented by a light gray color).
### Key Observations
The numbers are scattered throughout the grid with no immediately obvious pattern. The number 3 appears most frequently. The number 8 appears only once. The grid is not fully populated, suggesting it is a puzzle to be solved.
### Interpretation
The image depicts a partially completed logic puzzle. The goal of the puzzle is likely to fill the remaining cells with numbers following specific rules (e.g., each row, column, or region must contain unique numbers). The provided numbers serve as constraints for solving the puzzle. Without knowing the specific rules of the puzzle, it is difficult to determine the intended solution or the underlying logic. The puzzle appears to be a simplified version of a Sudoku or similar number-placement game.
</details>
Figure 30: Rectangles: Divide the grid into rectangles with areas equal to the numbers.
<details>
<summary>extracted/5699650/img/puzzles/samegame.png Details</summary>

### Visual Description
\n
## Block-Based Image: Color Distribution
### Overview
The image presents a grid of colored blocks, arranged in a roughly square shape. There are no axes, labels, or legends explicitly present. The colors used are red, green, and blue, along with white representing empty space. The arrangement of these blocks forms a shape resembling a stylized letter "T". The image appears to be a visual pattern or puzzle, rather than a data-driven chart.
### Components/Axes
There are no axes or legends. The components are individual colored blocks. The grid appears to be approximately 10x10, though the exact dimensions are difficult to determine due to the irregular shape formed by the colored blocks.
### Detailed Analysis or Content Details
The image consists of a grid of colored squares. The colors are distributed as follows:
* **White:** Occupies the top-left portion of the grid, forming the upper part of the "T" shape. Approximately 25-30 blocks are white.
* **Blue:** Forms the vertical stem of the "T" shape, extending downwards from the white area. There are approximately 15-20 blue blocks.
* **Red:** Appears in the top-right quadrant and interspersed within the lower portion of the grid. There are approximately 15-20 red blocks.
* **Green:** Primarily located in the bottom-left and right corners, and scattered throughout the lower portion of the grid. There are approximately 20-25 green blocks.
The arrangement of the blocks is not random. The blue blocks form a continuous column, while the red and green blocks are more fragmented. The white blocks create a clear separation between the blue stem and the colored blocks forming the "T" head.
### Key Observations
The image lacks quantitative data. The distribution of colors appears deliberate, forming a recognizable shape. The white space is significant, defining the overall structure. There is no clear pattern in the arrangement of red and green blocks beyond their general location within the grid.
### Interpretation
The image is likely a visual representation of a pattern or a puzzle. The "T" shape could be symbolic, or simply a design element. The use of different colors might represent different categories or states within the puzzle. Without additional context, it is difficult to determine the specific meaning or purpose of the image. It does not appear to be a standard chart or graph conveying numerical data. The image is more akin to a visual exercise in pattern recognition or spatial reasoning. It could be a simplified representation of a more complex system, where each color represents a different component or state. The white space could represent an absence of something, or a boundary between different areas.
</details>
Figure 31: Same Game: Clear the grid by removing touching groups of the same colour squares.
<details>
<summary>extracted/5699650/img/puzzles/signpost.png Details</summary>

### Visual Description
\n
## Diagram: Grid-Based Puzzle
### Overview
The image depicts a 3x3 grid filled with numbers, letters, and symbols, arranged in a puzzle-like format. Each cell contains a value, and arrows indicate potential movement or relationships between cells. The grid appears to be a logic puzzle or a step in a larger problem-solving process.
### Components/Axes
The grid is composed of 9 cells arranged in three rows and three columns. The cells contain the following elements:
* Numbers: 1, 2, 3, 4, 5, 16
* Letters: a, d, e
* Mathematical expressions: e+1, a+1, d+1, d+2, d+3, a+4
* Arrows: Pointing up, down, left, and right.
* Dots: Representing a point or marker.
* Star: A star symbol in the bottom-right cell.
### Detailed Analysis or Content Details
Here's a breakdown of each cell's content and its position:
* **Row 1:**
* Cell (1,1): "1" with a downward-pointing arrow.
* Cell (1,2): "3" with a rightward-pointing arrow.
* Cell (1,3): "e+1" (orange background) with a downward-pointing arrow.
* **Row 2:**
* Cell (2,1): "2" with a leftward-pointing arrow.
* Cell (2,2): "d+1" (purple background) with a dot and a rightward-pointing arrow.
* Cell (2,3): "a" (orange background) with a dot and a rightward-pointing arrow.
* **Row 3:**
* Cell (3,1): "e" (orange background) with a dot and a leftward-pointing arrow.
* Cell (3,2): "d+3" (purple background) with a leftward-pointing arrow.
* Cell (3,3): "d" (orange background) with a dot and a leftward-pointing arrow.
* **Row 4:**
* Cell (4,1): "a+4" (orange background) with a leftward-pointing arrow.
* Cell (4,2): "d+2" (purple background).
* Cell (4,3): "16" (purple background) with a dot and a star symbol.
* **Row 5:**
* Cell (5,1): "5" with a leftward-pointing arrow.
### Key Observations
* The orange cells ("e+1", "a", "e", "a+4", "d") seem to be grouped together, potentially representing a sequence or a set of related values.
* The purple cells ("d+1", "d+3", "d+2", "16") also form a group, possibly representing another sequence.
* The arrows suggest a flow or a path through the grid.
* The dots within some cells might indicate starting points or intermediate steps.
* The star in the bottom-right cell could signify the goal or the solution.
### Interpretation
The diagram likely represents a logic puzzle or a step-by-step solution to a mathematical or algorithmic problem. The numbers, letters, and expressions suggest that the puzzle involves manipulating these values according to the directions indicated by the arrows. The grouping of orange and purple cells might indicate separate operations or constraints. The star likely marks the final result or the desired state. Without further context, it's difficult to determine the exact rules of the puzzle, but the arrangement suggests a systematic approach to solving it. The puzzle appears to be designed to test logical reasoning and problem-solving skills. The use of both numerical and alphabetical elements suggests a combination of arithmetic and symbolic manipulation.
</details>
Figure 32: Signpost: Connect the squares into a path following the arrows.
<details>
<summary>extracted/5699650/img/puzzles/singles.png Details</summary>

### Visual Description
\n
## Grid Puzzle: Number Placement
### Overview
The image depicts a partially filled grid puzzle, likely a variation of Sudoku or a similar logic-based number placement game. The grid is 6x6, with some cells pre-filled with numbers from 1 to 6. Other cells are filled with black squares. The goal is presumably to fill the remaining cells with numbers 1-6, following specific rules (not explicitly stated in the image).
### Components/Axes
The image consists of a 6x6 grid.
- Rows are numbered 1 to 6, from top to bottom.
- Columns are numbered 1 to 6, from left to right.
- Cells contain either a number from 1 to 6, or a black square.
### Detailed Analysis or Content Details
Here's a cell-by-cell transcription of the grid's contents, using row and column notation (e.g., R1C1 refers to the cell in the first row, first column):
* R1C1: 3
* R1C2: Black
* R1C3: 1
* R1C4: 5
* R1C5: 6
* R1C6: 6
* R2C1: 4
* R2C2: 1
* R2C3: 2
* R2C4: 2
* R2C5: 5
* R2C6: 3
* R3C1: Black
* R3C2: 5
* R3C3: 2
* R3C4: 1
* R3C5: 4
* R3C6: 4
* R4C1: 2
* R4C2: 3
* R4C3: 4
* R4C4: Black
* R4C5: 1
* R4C6: 5
* R5C1: 1
* R5C2: 6
* R5C3: Black
* R5C4: 3
* R5C5: 4
* R5C6: 6
* R6C1: 5
* R6C2: Black
* R6C3: 3
* R6C4: 4
* R6C5: 6
* R6C6: 1
### Key Observations
The grid has a mix of pre-filled numbers and black squares. The distribution of numbers appears somewhat random, but the black squares likely impose constraints on the placement of numbers. The numbers 1 through 6 are present, but their frequency varies across the grid.
### Interpretation
The image presents a logic puzzle. The black squares likely represent blocked cells that cannot be filled with numbers. The goal is to deduce the correct placement of the numbers 1-6 in the remaining cells, adhering to the rules of the puzzle (which are not provided in the image). The puzzle's difficulty depends on the specific rules and the initial configuration of numbers and black squares. Without knowing the rules, it's impossible to determine the solution or the puzzle's intended complexity. The image is a static representation of a problem, not a demonstration of a process or trend. It is a problem statement.
</details>
Figure 33: Singles: Black out the right set of duplicate numbers.
<details>
<summary>extracted/5699650/img/puzzles/sixteen.png Details</summary>

### Visual Description
\n
## Diagram: Numbered Grid with Arrows
### Overview
The image depicts a 4x4 grid of numbered cells, ranging from 1 to 16. Each cell contains a single integer. Surrounding the grid are arrows indicating direction, positioned at the midpoint of each side. The background is a uniform gray.
### Components/Axes
The diagram consists of:
* A 4x4 grid.
* Numbers 1 through 16, each occupying a single cell.
* Arrows pointing inwards towards the grid on all four sides (top, bottom, left, right).
### Detailed Analysis or Content Details
The grid cells and their corresponding numbers are as follows:
* Row 1: 13, 2, 3, 4
* Row 2: 1, 6, 7, 8
* Row 3: 5, 9, 10, 12
* Row 4: 11, 14, 15, 16
The arrows are uniformly gray and point towards the center of the grid. There are four arrows on each side, evenly spaced.
### Key Observations
The numbers are arranged in a seemingly arbitrary order within the grid. The arrows suggest a directional influence or flow towards the center of the grid. The grid is centered within the image frame.
### Interpretation
This diagram likely represents a puzzle or a system with directional constraints. The numbers could represent states, values, or positions within a larger system. The arrows suggest that elements are influenced by or move towards the center of the grid. Without further context, it's difficult to determine the specific purpose or rules governing this arrangement. It could be a simplified representation of a more complex process, such as a routing algorithm, a game board, or a data flow diagram. The arrangement of numbers may be significant, potentially representing a code or a sequence to be solved. The arrows could indicate input or output pathways. The diagram is a visual representation of a structured system, but its precise meaning is unclear without additional information.
</details>
Figure 34: Sixteen: Slide a row at a time to arrange the tiles into order.
<details>
<summary>extracted/5699650/img/puzzles/slant.png Details</summary>

### Visual Description
\n
## Diagram: Network of Nodes with Numerical Labels
### Overview
The image depicts a network of interconnected nodes arranged in a grid-like pattern. Each node is labeled with a numerical value ranging from 0 to 3. The connections between nodes are represented by lines, forming a complex web-like structure. The background is a light gray grid.
### Components/Axes
There are no explicit axes or legends in the traditional sense. The diagram consists of nodes and connecting lines. The numerical labels within the circles represent the value associated with each node. The grid provides a spatial reference but does not have labeled axes.
### Detailed Analysis or Content Details
The diagram consists of approximately 30 nodes. The numerical values within the nodes are distributed as follows (approximate counts):
* 0: ~8 occurrences
* 1: ~6 occurrences
* 2: ~9 occurrences
* 3: ~7 occurrences
The connections between nodes are directional, indicated by the arrowheads on the lines. The network appears to be non-hierarchical, with multiple paths and loops.
Here's a more detailed breakdown of the node values and their connections, reading row by row from top to bottom:
* **Row 1:** 1, 0, 2, 3, 3, 2, 0
* **Row 2:** 2, 3, 2, 2, 2, 1
* **Row 3:** 3, 2, 2, 3, 2, 0
* **Row 4:** 0, 3, 2, 1, 2, 1, 0
* **Row 5:** 0, 1, 0, 1, 0, 0
The connections are complex and do not follow a simple pattern. There are several nodes with multiple incoming and outgoing connections.
### Key Observations
* The values 2 and 3 appear more frequently than 0 and 1.
* The network is densely connected, with most nodes having at least two connections.
* There are several loops and cycles within the network.
* The distribution of values does not appear to be random; there may be some underlying pattern or rule governing the assignment of values to nodes.
### Interpretation
This diagram likely represents a state transition network or a similar type of graph where nodes represent states and edges represent transitions between states. The numerical values within the nodes could represent various attributes, such as cost, probability, or reward associated with each state. The directional connections indicate the flow of transitions.
Without additional context, it is difficult to determine the specific meaning of the diagram. However, the complexity of the network suggests that it models a system with multiple interacting components and a rich set of possible behaviors. The frequent occurrence of values 2 and 3 might indicate that these states are more common or desirable than states with values 0 and 1.
The diagram could be used to analyze the behavior of a system, identify critical paths, or optimize performance. It could also be a representation of a decision-making process, where nodes represent choices and edges represent actions.
The absence of a clear legend or axis labels suggests that the diagram is intended for a specialized audience who are already familiar with the underlying concepts and terminology. It is a visual representation of a complex system, and its interpretation requires a deep understanding of the domain it represents.
</details>
Figure 35: Slant: Draw a maze of slanting lines that matches the clues.
<details>
<summary>extracted/5699650/img/puzzles/solo.png Details</summary>

### Visual Description
\n
## Sudoku Puzzle: Partially Filled Grid
### Overview
The image presents a partially filled 9x9 Sudoku grid. The grid is composed of 9 blocks, each a 3x3 square. Some cells are pre-filled with numbers from 1 to 9, while others are empty. The goal of Sudoku is to fill the empty cells with numbers such that each row, each column, and each 3x3 block contains all the digits from 1 to 9 without repetition.
### Components/Axes
The image consists of a 9x9 grid. There are no explicit axes or legends. The grid is defined by thick black lines separating the 3x3 blocks. The numbers within the grid are the data points.
### Detailed Analysis or Content Details
The following numbers are pre-filled in the grid:
* **Row 1:** 4, 2, 6, 1, 9, 5
* **Row 2:** 5, 6, 2, 1, 3
* **Row 3:** 7, 3, 1, 4, 5, 9, 6, 8, 2
* **Row 4:** 9, 5, 4, 1, 6, 7, 2, 3, 8
* **Row 5:** 1, 7, 8, 4, 5, 3, 9
* **Row 6:** 3, 6, 2, 5, 8, 1, 7, 4
* **Row 7:** 6, 4, 5, 9, 1, 3, 8, 2, 7
* **Row 8:** 2, 1, 8, 4, 6, 5, 1
* **Row 9:** 1, 4, 6
The green highlighted numbers are:
* Row 1: 4, 2, 6, 1, 9, 5
* Row 3: 7, 3, 1, 4, 5, 9, 6, 8, 2
* Row 5: 1, 7, 8, 4, 5, 3, 9
* Row 7: 6, 4, 5, 9, 1, 3, 8, 2, 7
* Row 9: 1, 4, 6
### Key Observations
The puzzle is partially filled, indicating an incomplete Sudoku. The distribution of numbers appears relatively even across the grid, but a full analysis would require solving the puzzle to determine the difficulty and uniqueness of the solution.
### Interpretation
The image represents a classic logic-based number-placement puzzle. The pre-filled numbers serve as constraints, and the goal is to deduce the remaining numbers based on the rules of Sudoku. The puzzle's difficulty depends on the number and placement of the initial numbers. The green highlighting does not appear to follow any logical pattern and may be an artifact of the image creation or a visual aid for a specific solution attempt. The image itself does not provide any inherent meaning beyond the puzzle itself. It is a problem statement requiring logical deduction to solve.
</details>
Figure 36: Solo: Fill in the grid so that each row, column and square block contains one of every digit.
<details>
<summary>extracted/5699650/img/puzzles/tents.png Details</summary>

### Visual Description
\n
## Grid Map: Distribution of Trees and Shapes
### Overview
The image depicts a grid map with a distribution of two types of symbols: green trees and orange triangles. The grid is labeled with numerical coordinates along both axes. The map appears to represent a spatial distribution of these symbols.
### Components/Axes
* **X-axis:** Labeled with numbers 0 to 3, increasing from left to right.
* **Y-axis:** Labeled with numbers 1 to 3, increasing from bottom to top.
* **Symbols:**
* Green Tree
* Orange Triangle
### Detailed Analysis
The grid is 4 units wide (X-axis) and 3 units high (Y-axis). The following describes the location of each symbol:
* **(0, 1):** Green Tree
* **(0, 3):** Green Tree
* **(1, 1):** Orange Triangle
* **(1, 2):** Green Tree
* **(1, 3):** Orange Triangle
* **(2, 1):** Orange Triangle
* **(2, 2):** Green Tree
* **(2, 3):** Orange Triangle
* **(3, 1):** Green Tree
* **(3, 2):** Green Tree
* **(3, 3):** Orange Triangle
* **(1, 1):** Green Tree
* **(2, 1):** Green Tree
### Key Observations
* There are more green trees (8) than orange triangles (5).
* The orange triangles are more concentrated towards the top of the grid (Y-axis values of 2 and 3).
* The green trees are more evenly distributed across the grid.
* There is a cluster of green trees in the bottom-left corner (X=0, Y=1 and X=0, Y=3).
### Interpretation
The data suggests a non-uniform distribution of trees and shapes across the grid. The higher concentration of orange triangles in the upper portion of the grid could indicate a specific environmental factor or a deliberate placement pattern. The greater number of green trees suggests they are the dominant element in this space. Without additional context, it's difficult to determine the meaning of this distribution. It could represent anything from a forest with clearings to a game map with resource locations. The grid coordinates provide a precise location for each element, allowing for spatial analysis. The lack of a legend beyond the visual distinction of the shapes implies that the shapes themselves are the key identifiers.
</details>
Figure 37: Tents: Place a tent next to each tree.
<details>
<summary>extracted/5699650/img/puzzles/towers.png Details</summary>

### Visual Description
\n
## 3D Block Arrangement: Numerical Labeling
### Overview
The image depicts a 3x3 arrangement of cubes in a 3D space. Each cube is labeled with a number, and the arrangement is annotated with numerical values along the x, y, and z axes. The arrangement appears to be a visual representation of a data structure or a puzzle.
### Components/Axes
The arrangement is defined by three axes:
* **X-axis:** Labeled with values 1, 2, 3, 1, 2, 3, 1, 2, 3.
* **Y-axis:** Labeled with values 1, 2, 3, 1, 2, 3, 1, 2, 3.
* **Z-axis:** Labeled with values 1, 2, 3, 1, 2, 3, 1, 2, 3.
The cubes themselves are labeled with the following numbers:
* Top-Left: 3
* Top-Center: 4
* Top-Right: 2
* Middle-Left: 4
* Middle-Center: 3
* Middle-Right: 3
* Bottom-Left: 3
* Bottom-Center: 4
* Bottom-Right: 4
### Detailed Analysis or Content Details
The arrangement is a 3x3 grid of cubes. The numbers on the cubes are distributed as follows:
* The number '2' appears 3 times.
* The number '3' appears 4 times.
* The number '4' appears 2 times.
The x-axis values repeat the sequence 1, 2, 3, 1, 2, 3, 1, 2, 3.
The y-axis values repeat the sequence 1, 2, 3, 1, 2, 3, 1, 2, 3.
The z-axis values repeat the sequence 1, 2, 3, 1, 2, 3, 1, 2, 3.
### Key Observations
The arrangement does not appear to follow a simple numerical pattern. The numbers on the cubes are not arranged in ascending or descending order. The axis labels do not directly correlate with the numbers on the cubes.
### Interpretation
The image likely represents a puzzle or a data structure where the position and value of each cube are significant. The arrangement could be part of a larger system or a code. The repetition of numbers and the seemingly random distribution suggest a complex underlying logic. Without additional context, it is difficult to determine the exact purpose or meaning of the arrangement. The arrangement could be a visual representation of a 3D array or a matrix. The axis labels might represent indices or coordinates within the structure. The numbers on the cubes could represent values stored in those locations. The image is a static representation of a potentially dynamic system.
</details>
Figure 38: Towers: Complete the latin square of towers in accordance with the clues.
<details>
<summary>extracted/5699650/img/puzzles/tracks.png Details</summary>

### Visual Description
\n
## Diagram: Train Track Puzzle
### Overview
The image depicts a grid-based puzzle featuring train tracks. The grid is 7 cells high and 6 cells wide. The puzzle appears to involve connecting two points, labeled 'A' and 'B', with a continuous train track. Some grid cells are shaded gray, representing obstacles or areas where tracks cannot be placed. The grid is numbered along the top (1-5) and left side (2-6).
### Components/Axes
* **Grid:** 7x6 grid of cells.
* **Horizontal Axis:** Numbered 1 to 5.
* **Vertical Axis:** Numbered 2 to 6.
* **Points A & B:** Starting and ending points for the train track. Point A is located at the bottom-left corner, and Point B is located at the bottom-right corner.
* **Train Tracks:** Curved and straight track segments connecting points A and B.
* **Shaded Cells:** Gray cells representing obstacles.
* **Equal Sign:** An equal sign (=) is present within a shaded cell.
### Detailed Analysis or Content Details
The train track starts at point A (approximately row 6, column 1) and ends at point B (approximately row 6, column 5). The track consists of curved and straight segments.
* **Point A (6,1):** The track initially curves upwards and to the right.
* **Row 6:** The track continues horizontally to the right, then curves upwards.
* **Row 5:** The track continues horizontally to the right.
* **Row 4:** The track continues horizontally to the right.
* **Row 3:** The track continues horizontally to the right.
* **Row 2:** The track continues horizontally to the right.
* **Row 2, Column 3:** The track curves upwards.
* **Row 1:** The track continues horizontally to the right.
* **Row 1, Column 4:** The track curves downwards.
* **Row 2, Column 4:** The track continues horizontally to the right.
* **Row 2, Column 5:** The track curves downwards.
* **Row 3, Column 5:** The track continues horizontally to the right.
* **Row 3, Column 6:** The track curves downwards.
* **Row 4, Column 6:** The track continues horizontally to the left.
* **Row 4, Column 5:** The track curves downwards.
* **Row 5, Column 5:** The track continues horizontally to the right.
* **Row 5, Column 6:** The track curves downwards.
* **Point B (6,5):** The track ends at point B.
The shaded cells are located at:
* (2,1) to (2,5)
* (4,1) to (4,5)
* (5,2) to (5,4)
* (3,5) to (3,6)
* (4,6)
The equal sign is located within the shaded cell at approximately (4,5).
### Key Observations
The puzzle appears to require finding a path for the train track that avoids the shaded cells. The track is a single continuous line. The equal sign within the shaded cell is an unusual element and may be a distraction or a clue.
### Interpretation
This image presents a logic puzzle. The goal is to connect points A and B with a continuous train track, navigating around the obstacles represented by the shaded cells. The grid coordinates provide a precise location system for defining the puzzle's constraints. The presence of the equal sign within a shaded cell is curious and could be a red herring or a symbolic element related to the puzzle's solution. The puzzle tests spatial reasoning and problem-solving skills. The puzzle is likely designed to be solved visually, by tracing a path that adheres to the given constraints.
</details>
Figure 39: Tracks: Fill in the railway track according to the clues.
<details>
<summary>extracted/5699650/img/puzzles/twiddle.png Details</summary>

### Visual Description
\n
## Diagram: Numbered Grid
### Overview
The image depicts a 3x3 grid composed of nine square cells. Each cell contains a single, bold, black numeral from 1 to 9. The grid is overlaid with diagonal lines that intersect at the center cell (containing the number 9), dividing the grid into triangular sections. The background of each cell is a light gray.
### Components/Axes
There are no explicit axes or legends. The components are:
* Nine cells arranged in a 3x3 grid.
* Numbers 1 through 9, each appearing once.
* Two diagonal lines intersecting at the center.
### Detailed Analysis or Content Details
The numbers are positioned as follows:
* Top Row: 1 (top-left), 2 (top-center), 3 (top-right)
* Middle Row: 8 (middle-left), 9 (center), 6 (middle-right)
* Bottom Row: 4 (bottom-left), 7 (bottom-center), 5 (bottom-right)
The diagonal lines extend from the top-left corner to the bottom-right corner, and from the top-right corner to the bottom-left corner. They intersect at the cell containing the number 9.
### Key Observations
The arrangement of numbers does not immediately suggest a mathematical sequence or pattern. The diagonal lines create a visual division of the grid, but their purpose is unclear without additional context. The numbers are sequentially ordered, but not in a row or column.
### Interpretation
The image appears to be a visual representation of a numbered grid, potentially related to a puzzle or game. The arrangement of numbers and the diagonal lines might be part of a specific rule set or challenge. Without further information, it's difficult to determine the underlying meaning or purpose of the diagram. It could be a simplified representation of a sliding puzzle, or a visual aid for a mathematical concept. The image does not provide any data or facts beyond the arrangement of the numbers 1-9 within the grid.
</details>
Figure 40: Twiddle: Rotate the tiles around themselves to arrange them into order.
<details>
<summary>extracted/5699650/img/puzzles/undead.png Details</summary>

### Visual Description
\n
## Diagram: Game Board with Numerical Annotations
### Overview
The image depicts a 4x4 grid representing a game board, likely for a puzzle or strategy game. The grid contains icons representing game pieces (a ghost, a frowning face, and a smiling face) and diagonal lines. Numerical values are associated with each row and column, and also with each icon type in a legend at the top.
### Components/Axes
* **Grid:** A 4x4 square grid.
* **Legend (Top):** Contains three icons: a blue ghost (labeled "5"), a brown frowning face (labeled "2"), and a green smiling face (labeled "2").
* **Row Labels (Left):** Numbers 1, 1, 3, 1.
* **Column Labels (Top):** Numbers 2, 0, 1, 2.
* **Bottom Row Labels:** Numbers 2, 0, 0, 0.
* **Top Row Labels:** Numbers 2, 0, 1, 2.
* **Game Pieces:** A ghost, a frowning face, and a smiling face are placed within the grid.
* **Diagonal Lines:** Several diagonal lines are present within the grid.
### Detailed Analysis or Content Details
The grid contains the following elements:
* **Row 1:**
* Column 1: Frowning face. Value: 1.
* Column 2: Diagonal line. Value: 1.
* Column 3: Diagonal line. Value: 1.
* Column 4: Empty. Value: 1.
* **Row 2:**
* Column 1: Ghost. Value: 1.
* Column 2: Diagonal line. Value: 1.
* Column 3: Smiling face. Value: 1.
* Column 4: Empty. Value: 1.
* **Row 3:**
* Column 1: Empty. Value: 3.
* Column 2: Diagonal line. Value: 2.
* Column 3: Diagonal line. Value: 2.
* Column 4: Diagonal line. Value: 2.
* **Row 4:**
* Column 1: Empty. Value: 1.
* Column 2: Diagonal line. Value: 5.
* Column 3: Diagonal line. Value: 5.
* Column 4: Diagonal line. Value: 5.
The legend indicates the following counts:
* Ghost: 5
* Frowning Face: 2
* Smiling Face: 2
### Key Observations
* The numerical values associated with the rows and columns appear to be independent of the game pieces or lines within the grid.
* The legend values (5, 2, 2) do not directly correspond to the number of each icon present on the board (1 ghost, 1 frowning face, 1 smiling face).
* The diagonal lines are concentrated in the lower rows.
### Interpretation
This diagram likely represents a state in a puzzle or strategy game. The numbers around the grid might represent constraints, scores, or resources. The icons represent game pieces with specific properties. The diagonal lines could represent connections, obstacles, or paths. The legend values might represent the total number of each piece available in the game, or a target number to achieve.
The discrepancy between the legend counts and the actual piece counts on the board suggests that the game involves removing or adding pieces, or that the board represents only a portion of the game state. The arrangement of the diagonal lines and the numerical values suggest a strategic element, where players must manipulate the board to achieve a specific goal.
Without further context, it's difficult to determine the exact rules or objective of the game. However, the diagram provides a snapshot of a complex game state with multiple interacting elements. The numbers and icons are likely key to understanding the game's mechanics and strategy.
</details>
Figure 41: Undead: Place ghosts, vampires and zombies so that the right numbers of them can be seen in mirrors.
<details>
<summary>extracted/5699650/img/puzzles/unequal.png Details</summary>

### Visual Description
\n
## Diagram: Numerical Grid with Arrows
### Overview
The image presents a 3x3 grid of cells. Most cells are empty, but some contain the numbers "1" or "4" in green text. Several cells also contain arrow symbols pointing in different directions (up, down, left, right). The arrangement appears to be a visual puzzle or a representation of a state in a game.
### Components/Axes
The diagram consists of:
* A 3x3 grid structure.
* Numerical values: "1" and "4".
* Arrow symbols: "<", ">", "^", "v".
* Empty cells.
### Detailed Analysis or Content Details
The grid can be described as follows, row by row:
* **Row 1:**
* Cell 1: "4" with a downward-pointing arrow ("v").
* Cell 2: Left-pointing arrow ("<").
* Cell 3: Empty.
* **Row 2:**
* Cell 1: Empty.
* Cell 2: "4".
* Cell 3: Empty.
* **Row 3:**
* Cell 1: Empty.
* Cell 2: Upward-pointing arrow ("^") with "4".
* Cell 3: Right-pointing arrow (">") with "1".
* **Row 4:**
* Cell 1: Downward-pointing arrow ("v").
* Cell 2: "1".
* Cell 3: "4".
### Key Observations
* The number "4" appears more frequently than the number "1".
* Arrows are often associated with numerical values, potentially indicating movement or transformation.
* The arrangement doesn't seem to follow a simple numerical sequence or pattern.
* The grid is visually balanced, with elements distributed across the rows and columns.
### Interpretation
The diagram likely represents a puzzle or a game state. The numbers and arrows suggest a system where values can be moved or manipulated within the grid. The arrows could indicate the direction of movement, and the numbers could represent scores, resources, or other game elements. Without further context, it's difficult to determine the exact rules or objective of the puzzle. The arrangement could be a snapshot of a partially solved puzzle, or a starting configuration for a game. The presence of empty cells suggests that the grid can be further populated or modified. The diagram is a visual representation of a system with defined elements and potential interactions, but its meaning is dependent on the underlying rules or context.
</details>
Figure 42: Unequal: Complete the latin square in accordance with the > signs.
<details>
<summary>extracted/5699650/img/puzzles/unruly.png Details</summary>

### Visual Description
\n
## Heatmap: Grid of Shaded Squares
### Overview
The image presents a 5x5 grid of squares, each filled with a different shade of gray, ranging from white to black. This appears to be a heatmap-like visualization, though without explicit axes or a legend, the data it represents is unknown. The shading suggests a quantitative variable is being mapped to the grid cells.
### Components/Axes
There are no explicit axes or labels present in the image. The grid itself forms the primary structure. The squares are arranged in a regular, rectangular pattern. There is no legend.
### Detailed Analysis or Content Details
The grid can be described by row and column, with each cell having a specific shade. The shades are approximate, and described relative to white and black.
* **Row 1:** Black, Gray (70%), Light Gray, Black, Light Gray
* **Row 2:** Light Gray, White, Gray (50%), Black, Black
* **Row 3:** White, Gray (50%), White, Gray (70%), White
* **Row 4:** Black, Gray (50%), White, Light Gray, Gray (70%)
* **Row 5:** Light Gray, Black, Gray (70%), Black, Gray (50%)
The shades are as follows (approximated):
* **White:** Represents the lowest value.
* **Light Gray:** Represents a low value, slightly higher than white.
* **Gray (50%):** Represents a medium value.
* **Gray (70%):** Represents a higher medium value.
* **Black:** Represents the highest value.
### Key Observations
The distribution of shades appears somewhat random, but there is a concentration of darker shades (black and dark gray) along the main diagonal (from top-left to bottom-right). There are also clusters of lighter shades in the center of the grid.
### Interpretation
Without context, it's difficult to definitively interpret the data. However, the heatmap suggests a correlation or pattern within the grid. The concentration of darker shades along the diagonal could indicate a strong relationship between the corresponding variables (if the grid represents a correlation matrix). The lack of axes and a legend makes it impossible to determine what the variables are or what the shades represent quantitatively. The image is a visual representation of data, but the meaning of that data is not explicitly provided. It could represent anything from correlation coefficients to density distributions, or even a simple aesthetic pattern. The image is a visualization, but lacks the necessary metadata to be fully understood.
</details>
Figure 43: Unruly: Fill in the black and white grid to avoid runs of three.
<details>
<summary>extracted/5699650/img/puzzles/untangle.png Details</summary>

### Visual Description
\n
## Diagram: Network Graph
### Overview
The image depicts a network graph consisting of nine nodes (represented as blue circles) connected by twelve edges (represented as blue lines). The graph is displayed against a light gray background. There are no axis labels, legends, or numerical data associated with the nodes or edges.
### Components/Axes
The diagram consists solely of nodes and edges. There are no explicit axes or labels. The nodes are uniformly sized and colored blue. The edges are straight lines connecting the nodes, also colored blue.
### Detailed Analysis or Content Details
The graph can be described by listing the connections between nodes. Let's label the nodes 1 through 9, starting from the bottom-left and proceeding roughly clockwise.
* Node 1 is connected to Nodes 2 and 3.
* Node 2 is connected to Nodes 1, 3, and 4.
* Node 3 is connected to Nodes 1, 2, 5, and 6.
* Node 4 is connected to Nodes 2 and 5.
* Node 5 is connected to Nodes 3, 4, 6, and 7.
* Node 6 is connected to Nodes 3, 5, and 8.
* Node 7 is connected to Nodes 5 and 9.
* Node 8 is connected to Nodes 6 and 9.
* Node 9 is connected to Nodes 7 and 8.
There are no weights or directions indicated on the edges.
### Key Observations
The graph appears to be relatively dense, with a significant number of connections between nodes. Node 3 and Node 5 have the highest degree (number of connections) at four connections each. Node 1 and Node 9 have the lowest degree at two connections each. The graph does not appear to have any obvious symmetry.
### Interpretation
The diagram represents a network or a relationship between nine entities. The connections indicate some form of association or interaction between these entities. Without further context, it's difficult to determine the specific meaning of the network. It could represent a social network, a communication network, a transportation network, or any other system where entities are connected. The varying degrees of the nodes suggest that some entities are more central or have more connections than others. The lack of edge weights or directions implies that the relationships are undirected and unweighted. The graph's structure suggests a complex interplay between the entities, with multiple paths connecting different parts of the network. It is a simple graph, and does not contain any additional information.
</details>
Figure 44: Untangle: Reposition the points so that the lines do not cross.
Appendix E Puzzle-specific Metadata
E.1 Action Space
We display the action spaces for all supported puzzles in Table 5. The action spaces vary in size and in the types of actions they contain. As a result, an agent must learn the meaning of each action independently for each puzzle.
Table 5: The action spaces for each puzzle are listed, along with their cardinalities. The actions are listed with their name in the original Puzzle Collection C code.
| Black Box | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| --- | --- | --- |
| Bridges | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Cube | 4 | UP, DOWN, LEFT, RIGHT |
| Dominosa | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Fifteen | 4 | UP, DOWN, LEFT, RIGHT |
| Filling | 13 | UP, DOWN, LEFT, RIGHT, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| Flip | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Flood | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Galaxies | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Guess | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Inertia | 9 | 1, 2, 3, 4, 6, 7, 8, 9, UNDO |
| Keen | 14 | UP, DOWN, LEFT, RIGHT, SELECT2, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| Light Up | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Loopy | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Magnets | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Map | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Mines | 7 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2, UNDO |
| Mosaic | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Net | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Netslide | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Palisade | 5 | UP, DOWN, LEFT, RIGHT, CTRL |
| Pattern | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Pearl | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Pegs | 6 | UP, DOWN, LEFT, RIGHT, SELECT, UNDO |
| Range | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Rectangles | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Same Game | 6 | UP, DOWN, LEFT, RIGHT, SELECT, UNDO |
| Signpost | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Singles | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Sixteen | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Slant | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Solo | 13 | UP, DOWN, LEFT, RIGHT, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| Tents | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Towers | 14 | UP, DOWN, LEFT, RIGHT, SELECT2, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| Tracks | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
| Twiddle | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Undead | 8 | UP, DOWN, LEFT, RIGHT, SELECT2, 1, 2, 3 |
| Unequal | 13 | UP, DOWN, LEFT, RIGHT, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| Unruly | 6 | UP, DOWN, LEFT, RIGHT, SELECT, SELECT2 |
| Untangle | 5 | UP, DOWN, LEFT, RIGHT, SELECT |
E.2 Optional Parameters
We display the optional parameters for all supported puzzles in LABEL:tab:parameters. If none are supplied upon initialization, a set of default parameters gets used for the puzzle generation process.
Table 6: For each puzzle, all optional parameters a user may supply are shown and described. We also give the required data type of variable, where applicable (e.g., int or char). For parameters that accept one of a few choices (such as difficulty), the accepted values and corresponding explanation are given in braces. As as example: a difficulty parameter is listed as d{int} with allowed values {0 = easy, 1 = medium, 2 = hard}. In this case, choosing medium difficulty would correspond to d1.
| Black Box | w8h8m5M5 | w{int} | grid width | (w $·$ h + w + h + 1) |
| --- | --- | --- | --- | --- |
| h{int} | grid height | $·$ (w + 2) $·$ (h + 2) | | |
| m{int} | minimum number of balls | | | |
| M{int} | maximum number of balls | | | |
| Bridges | 7x7i5e2m2d0 | {int}x{int} | grid width $×$ grid height | 3 $·$ w $·$ $·$ (w + h + 8) |
| i{int} | percentage of island squares | | | |
| e{int} | expansion factor | | | |
| m{int} | max bridges per direction | | | |
| d{int} | difficulty {0 = easy, 1 = medium, 2 = hard} | | | |
| Cube | c4x4 | {char} | type {c = cube, t = tetrahedron, | w $·$ $·$ F |
| o = octahedron, i = icosahedron} | F = number of the body’s faces | | | |
| {int}x{int} | grid width $×$ grid height | | | |
| Dominosa | 6db | {int} | maximum number of dominoes | $\frac{1}{2}\left(\text{w}^{2}\text{ + 3w + 2}\right)$ |
| d{char} | difficulty {t = trivial, b = basic, h = hard, | $·(\text{4}\sqrt{\text{w}^{2}\text{ + 3w + 2}}\text{ + 1})$ | | |
| e = extreme, a = ambiguous} | | | | |
| Fifteen | 4x4 | {int}x{int} | grid width $×$ grid height | $(w· h)^{4}$ |
| Filling | 13x9 | {int}x{int} | grid width $×$ grid height | $(w· h)·(w+h+1)$ |
| Flip | 5x5c | {int}x{int} | grid width $×$ grid height | $(w· h)·(w+h+1)$ |
| {char} | type {c = crosses, r = random} | | | |
| Flood | 12x12c6m5 | {int}x{int} | grid width $×$ grid height | $(w· h)·(w+h+1)$ |
| c{int} | number of colors | | | |
| m{int} | extra moves permitted (above the | | | |
| solver’s minimum) | | | | |
| Galaxies | 7x7dn | {int}x{int} | grid width $×$ grid height | $(2· w· h-w-h)$ |
| d{char} | difficulty {n = normal, u = unreasonable} | $·(2· w+2· h+1)$ | | |
| Guess | c6p4g10Bm | c{int} | number of colors | $(p+1)· g·(c+p)$ |
| p{int} | pegs per guess | | | |
| g{int} | maximum number of guesses | | | |
| {char} | allow blanks {B = no, b = yes} | | | |
| {char} | allow duplicates {M = no, m = yes} | | | |
| Inertia | 10x8 | {int}x{int} | grid width $×$ grid height | $0.2· w^{2}· h^{2}$ |
| Keen | 6dn | {int} | grid size | $(2· w+1)· w^{2}$ |
| d{char} | difficulty {e = easy, n = normal, h = hard, | | | |
| x = extreme, u = unreasonable} | | | | |
| {char} | (Optional) multiplication only {m = yes} | | | |
| Light Up | 7x7b20s4d0 | {int}x{int} | grid width $×$ grid height | $\frac{1}{2}·(w+h+1)$ |
| b{int} | percentage of black squares | $·(w· h+1)$ | | |
| s{int} | symmetry {0 = none, 1 = 2-way mirror, | | | |
| 2 = 2-way rotational, 3 = 4-way mirror, | | | | |
| 4 = 4-way rotational} | | | | |
| d{int} | difficulty {0 = easy, 1 = tricky, 2 = hard} | | | |
| Loopy | 10x10t12dh | {int}x{int} | grid width $×$ grid height | $(2· w· h+1)· 3·(w· h)^{2}$ |
| t{int} | type {0 = squares, 1 = triangular, | | | |
| 2 = honeycomb, 3 = snub-square, | | | | |
| 4 = cairo, 5 = great-hexagonal, | | | | |
| 6 = octagonal, 7 = kites, | | | | |
| 8 = floret, 9 = dodecagonal, | | | | |
| 10 = great-dodecagonal, | | | | |
| 11 = Penrose (kite/dart), | | | | |
| 12 = Penrose (rhombs), | | | | |
| 13 = great-great-dodecagonal, | | | | |
| 14 = kagome, 15 = compass-dodecagonal, | | | | |
| 16 = hats} | | | | |
| d{char} | difficulty {e = easy, n = normal, | | | |
| t = tricky, h = hard} | | | | |
| Magnets | 6x5dtS | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+2)$ |
| d{char} | difficulty {e = easy, t = tricky | | | |
| {char} | (Optional) strip clues {S = yes} | | | |
| Map | 20x15n30dn | {int}x{int} | grid width $×$ grid height | $2· n·(1+w+h)$ |
| n{int} | number of regions | | | |
| d{char} | difficulty {e = easy, n = normal, h = hard, | | | |
| u = unreasonable} | | | | |
| Mines | 9x9n10 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| n{int} | number of mines | | | |
| p{char} | (Optional) ensure solubility {a = no} | | | |
| Mosaic | 10x10h0 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| {str} | (Optional) aggressive generation {h0 = no} | | | |
| Net | 5x5wb0.5 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+3)$ |
| {char} | (Optional) walls wrap around {w = yes} | | | |
| b{float} | barrier probability, interval: [0, 1] | | | |
| {char} | (Optional) ensure unique solution {a = no} | | | |
| Netslide | 4x4wb1m2 | {int}x{int} | grid width $×$ grid height | $2· w· h·(w+h-1)$ |
| {char} | (Optional) walls wrap around {w = yes} | | | |
| b{float} | barrier probability, interval: [0, 1] | | | |
| m{int} | (Optional) number of shuffling moves | | | |
| Palisade | 5x5n5 | {int}x{int} | grid width $×$ grid height | $(2· w· h-w-h)$ |
| n{int} | region size | $·(w+h+3)$ | | |
| Pattern | 15x15 | {int}x{int} | grid width $×$ grid height | $w· h(w+h+1)$ |
| Pearl | 8x8dtn | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+2)$ |
| d{char} | difficulty {e = easy, t = tricky} | | | |
| {char} | allow unsoluble {n = yes} | | | |
| Pegs | 7x7cross | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+2)$ |
| {str} | type {cross, octagon, random} | | | |
| Range | 9x6 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| Rectangles | 7x7e4 | {int}x{int} | grid width $×$ grid height | $2· w· h·(w+h+1)$ |
| e{int} | expansion factor | | | |
| {char} | ensure unique solution {a = no} | | | |
| Same Game | 5x5c3s2 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+2)$ |
| c{int} | number of colors | | | |
| s{int} | scoring system {1 = $(n-1)^{2}$ , | | | |
| 2 = $(n-2)^{2}$ } | | | | |
| {char} | (Optional) ensure solubility {r = no} | | | |
| Signpost | 4x4c | {int}x{int} | grid width $×$ grid height | $2· w· h·(w+h+1)$ |
| {char} | (Optional) start and end in corners | | | |
| {c = yes} | | | | |
| Singles | 5x5de | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| d{char} | difficulty {e = easy, k = tricky} | | | |
| Sixteen | 5x5m2 | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+3)$ |
| m{int} | (Optional) number of shuffling moves | | | |
| Slant | 8x8de | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| d{char} | difficulty {e = easy, h = hard} | | | |
| Solo | 3x3 | {int}x{int} | rows of sub-blocks $×$ cols of sub-blocks | $(w· h)^{2}*(2· w· h+1)$ |
| {char} | (Optional) require every digit on each | | | |
| main diagonal {x = yes} | | | | |
| * | | {char} | (Optional) jigsaw (irregularly shaped | |
| sub-blocks) main diagonal {j = yes} | | | | |
| * | | {char} | (Optional) killer (digit sums) {k = yes} | |
| * | | {str} | (Optional) symmetry. If not set, | |
| it is 2-way rotation. {a = None, | | | | |
| m2 = 2-way mirror, m4 = 4-way mirror, | | | | |
| r4 = 4-way rotation, m8 = 8-way mirror, | | | | |
| md2 = 2-way diagonal mirror, | | | | |
| md4 = 4-way diagonal mirror} | | | | |
| d{char} | difficulty {t = trivial, b = basic, | | | |
| i = intermediate, a = advanced, | | | | |
| e = extreme, u = unreasonable} | | | | |
| Tents | 8x8de | {int}x{int} | grid width $×$ grid height | $\frac{1}{4}·(w+1)·(h+1)$ |
| d{char} | difficulty {e = easy, t = tricky} | $·(w+h+1)$ | | |
| Towers | 5de | {int} | grid size | $2·(w+1)· w^{2}$ |
| d{char} | difficulty {e = easy, h = hard | | | |
| x = extreme, u = unreasonable} | | | | |
| Tracks | 8x8dto | {int}x{int} | grid width $×$ grid height | $w· h(2·(w+h)+1)$ |
| d{char} | difficulty {e = easy, t = tricky, h = hard} | | | |
| {char} | (Optional) disallow consecutive 1 clues | | | |
| {o = no} | | | | |
| Twiddle | 3x3n2 | {int}x{int} | grid width $×$ grid height | $(2· w· h· n^{2}+1)$ |
| n{int} | rotating block size | $·(w+h-2· n+1)$ | | |
| {char} | (Optional) one number per row {r = yes} | | | |
| {char} | (Optional) orientation matters {o = yes} | | | |
| m{int} | (Optional) number of shuffling moves | | | |
| Undead | 4x4dn | {int}x{int} | grid width $×$ grid height | $w· h·(w+h+1)$ |
| d{char} | difficulty {e = easy, n = normal, t = tricky} | | | |
| Unequal | 4adk | {int} | grid size | $w^{2}·(2· w+1)$ |
| {char} | (Optional) adjacent mode {a = yes} | | | |
| d{char} | difficulty {t = trivial, e = easy, k = tricky, | | | |
| x = extreme, r = recursive} | | | | |
| Unruly | 8x8dt | {int} | grid size | $w· h·(w+h+1)$ |
| {char} | (Optional) unique rows and cols {u = yes} | | | |
| d{char} | difficulty {t = trivial, e = easy, n = normal} | | | |
| Untangle | 25 | {int} | number of points | $n·(n+\sqrt{3n}· 4+2)$ |
E.3 Baseline Parameters
In Table 7, the parameters used for training the agents used for the comparisons in Section 3 is shown.
Table 7: Listed below are the generation parameters supplied to each puzzle instance before training an agent, as well as some puzzle-specific notes. We propose the easiest preset difficulty setting as a first challenge for RL algorithms to reach human-level performance.
| Black Box | w2h2m2M2 | w5h5m3M3 | |
| --- | --- | --- | --- |
| Bridges | 3x3 | 7x7i30e10m2d0 | |
| Cube | c3x3 | c4x4 | |
| Dominosa | 1dt | 3dt | |
| Fifteen | 2x2 | 4x4 | |
| Filling | 2x3 | 9x7 | |
| Flip | 3x3c | 3x3c | |
| Flood | 3x3c6m5 | 12x12c6m5 | |
| Galaxies | 3x3de | 7x7dn | |
| Guess | c2p3g10Bm | c6p4g10Bm | Episodes were terminated and negatively rewarded |
| after the maximum number of guesses was made | | | |
| without finding the correct solution. | | | |
| Inertia | 4x4 | 10x8 | |
| Keen | 3dem | 4de | Even the minimum allowed problem size |
| proved to be infeasible for a random agent | | | |
| Light Up | 3x3b20s0d0 | 7x7b20s4d0 | |
| Loopy | 3x3t0de | 3x3t0de | |
| Magnets | 3x3deS | 6x5de | |
| Map | 3x3n5de | 20x15n30de | |
| Mines | 4x4n2 | 9x9n10 | |
| Mosaic | 3x3 | 3x3 | |
| Net | 2x2 | 5x5 | |
| Netslide | 2x3b1 | 3x3b1 | |
| Palisade | 2x3n3 | 5x5n5 | |
| Pattern | 3x2 | 10x10 | |
| Pearl | 5x5de | 6x6de | |
| Pegs | 4x4random | 5x7cross | |
| Range | 3x3 | 9x6 | |
| Rectangles | 3x2 | 7x7 | |
| Same Game | 2x3c3s2 | 5x5c3s2 | |
| Signpost | 2x3 | 4x4c | |
| Singles | 2x3de | 5x5de | |
| Sixteen | 2x3 | 3x3 | |
| Slant | 2x2de | 5x5de | |
| Solo | 2x2 | 2x2 | |
| Tents | 4x4de | 8x8de | |
| Towers | 3de | 4de | |
| Tracks | 4x4de | 8x8de | |
| Twiddle | 2x3n2 | 3x3n2r | |
| Undead | 3x3de | 4x4de | |
| Unequal | 3de | 4de | |
| Unruly | 6x6dt | 8x8dt | Even the minimum allowed problem size |
| proved to be infeasible for a random agent | | | |
| Untangle | 4 | 6 | |
E.4 Detailed Baseline Results
We summarize all evaluated algorithms in Table 8.
Table 8: Summary of all evaluated RL algorithms.
| Proximal Policy Optimization (PPO) [61] Recurrent PPO [62] Advantage Actor Critic (A2C) [63] | On-Policy On-Policy On-Policy | No No No |
| --- | --- | --- |
| Asynchronous Advantage Actor Critic (A3C) [63] | On-Policy | No |
| Trust Region Policy Optimization (TRPO) [64] | On-Policy | No |
| Deep Q-Network (DQN) [11] | Off-Policy | No |
| Quantile Regression DQN (QRDQN) [65] | Off-Policy | No |
| MuZero [66] | Off-Policy | Yes |
| DreamerV3 [67] | Off-Policy | No |
As we limited the agents to a single final reward upon completion, where possible, we chose puzzle parameters that allowed random policies to successfully find a solution. Note that if a random policy fails to find a solution, an RL algorithm without guidance (such as intermediate rewards) will also be affected by this. If an agent has never accumulated a reward with the initial (random) policy, it will be unable to improve its performance at all.
The chosen parameters roughly corresponded to the smallest and easiest puzzles, as more complex puzzles were found to be intractable. This fact is highlighted for example in Solo/Sudoku, where the reasoning needed to find a valid solution is already rather complex, even for a grid with 2 $×$ 2 sub-blocks. A few puzzles were still intractable due to the minimum complexity permitted by Tathams’s puzzle-specific problem generators, such as with Unruly.
For the RGB pixel observations, the window size chosen for these small problems was set at 128 $×$ 128 pixels.
Table 9: Listed below are the detailed results for all evaluated algorithms. Results show the average number of steps required for all successful episodes and standard deviation with respect to the random seeds. In brackets, we show the overall percentage of successful episodes. In the summary row, the last number in brackets denotes the total number of puzzles where a solution below the upper bound of optimal steps was found. Entries without values mean that no successful policy was found among all random seeds. This Table is continued in Table 10.
Puzzle Supplied Parameters Optimal Random PPO TRPO DreamerV3 MuZero Blackbox w2h2m2M2 $144$ $2206$ $(99.2\%)$ $1773± 472$ $(59.5\%)$ $1744± 454$ $(96.3\%)$ $\mathbf{32± 5}$ $(100.0\%)$ $\mathbf{46± 0}$ $(0.1\%)$ Bridges 3x3 $378$ $547$ $(100.0\%)$ $682± 197$ $(85.1\%)$ $546± 13$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ $397± 181$ $(86.7\%)$ Cube c3x3 $54$ $4181$ $(66.9\%)$ $744± 1610$ $(77.5\%)$ $433± 917$ $(99.8\%)$ $5068± 657$ $(22.5\%)$ - Dominosa 1dt $32$ $1980$ $(99.2\%)$ $457± 954$ $(70.0\%)$ $\mathbf{12± 1}$ $(100.0\%)$ $\mathbf{11± 1}$ $(100.0\%)$ $3659± 0$ $(0.0\%)$ Fifteen 2x2 $256$ $54$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ $\mathbf{4± 0}$ $(100.0\%)$ $\mathbf{5± 1}$ $(100.0\%)$ Filling 2x3 $36$ $820$ $(100.0\%)$ $290± 249$ $(97.5\%)$ $\mathbf{9± 2}$ $(100.0\%)$ $443± 56$ $(83.4\%)$ $1099± 626$ $(15.0\%)$ Flip 3x3c $63$ $3138$ $(88.9\%)$ $3008± 837$ $(40.1\%)$ $2951± 564$ $(90.8\%)$ $1762± 568$ $(8.0\%)$ $1207± 1305$ $(3.1\%)$ Flood 3x3c6m5 $63$ $134$ $(97.4\%)$ $\mathbf{12± 0}$ $(99.9\%)$ $\mathbf{21± 4}$ $(99.6\%)$ $\mathbf{14± 1}$ $(100.0\%)$ $994± 472$ $(14.4\%)$ Galaxies 3x3de $156$ $4306$ $(33.9\%)$ $3860± 1778$ $(8.3\%)$ $4755± 527$ $(24.8\%)$ $3367± 1585$ $(11.0\%)$ $6046± 2722$ $(8.2\%)$ Guess c2p3g10Bm $200$ $358$ $(73.4\%)$ - $316± 52$ $(72.0\%)$ $268± 226$ $(77.0\%)$ $\mathbf{24± 0}$ $(0.8\%)$ Inertia 4x4 $51$ $13$ $(6.5\%)$ $\mathbf{22± 9}$ $(6.3\%)$ $635± 1373$ $(5.7\%)$ $926± 217$ $(5.7\%)$ $104± 73$ $(3.1\%)$ Keen 3dem $63$ $3152$ $(0.5\%)$ $3817± 0$ $(0.2\%)$ $5887± 1526$ $(0.4\%)$ $4350± 1163$ $(1.3\%)$ - Lightup 3x3b20s0d0 $35$ $2237$ $(98.1\%)$ $1522± 1115$ $(82.7\%)$ $2127± 168$ $(95.8\%)$ $438± 247$ $(72.0\%)$ $1178± 1109$ $(2.1\%)$ Loopy 3x3t0de $4617$ - - - - - Magnets 3x3deS $72$ $1895$ $(99.1\%)$ $1366± 1090$ $(90.2\%)$ $1912± 60$ $(99.1\%)$ $574± 56$ $(78.5\%)$ $1491± 0$ $(0.7\%)$ Map 3x3n5de $70$ $903$ $(99.9\%)$ $1172± 297$ $(75.7\%)$ $950± 34$ $(99.9\%)$ $1680± 197$ $(64.9\%)$ $467± 328$ $(0.9\%)$ Mines 4x4n2 $144$ $87$ $(18.1\%)$ $2478± 2424$ $(9.9\%)$ $\mathbf{123± 66}$ $(18.8\%)$ $272± 246$ $(50.1\%)$ $\mathbf{19± 22}$ $(4.6\%)$ Mosaic 3x3 $63$ $4996$ $(9.8\%)$ $4928± 438$ $(2.5\%)$ $5233± 615$ $(5.0\%)$ $4469± 387$ $(15.9\%)$ $5586± 0$ $(0.2\%)$ Net 2x2 $28$ $1279$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ $\mathbf{10± 0}$ $(100.0\%)$ $339± 448$ $(8.2\%)$ Netslide 2x3b1 $48$ $766$ $(100.0\%)$ $1612± 1229$ $(41.6\%)$ $635± 145$ $(100.0\%)$ $\mathbf{12± 0}$ $(100.0\%)$ $683± 810$ $(25.0\%)$ Netslide 3x3b1 $90$ $4671$ $(11.0\%)$ $4671± 498$ $(9.2\%)$ $4008± 1214$ $(8.9\%)$ $3586± 677$ $(22.4\%)$ $3721± 1461$ $(13.2\%)$ Palisade 2x3n3 $56$ $1428$ $(100.0\%)$ $939± 604$ $(87.0\%)$ $1377± 35$ $(99.9\%)$ $\mathbf{39± 56}$ $(100.0\%)$ $86± 0$ $(0.0\%)$ Pattern 3x2 $36$ $3247$ $(92.9\%)$ $1542± 1262$ $(71.9\%)$ $2908± 355$ $(90.2\%)$ $820± 516$ $(58.0\%)$ $4063± 1696$ $(1.9\%)$ Pearl 5x5de $300$ - - - - - Pegs 4x4Random $160$ - - - - - Range 3x3 $63$ $535$ $(100.0\%)$ $780± 305$ $(65.8\%)$ $661± 198$ $(99.9\%)$ $888± 238$ $(55.6\%)$ $91± 76$ $(5.1\%)$ Rect 3x2 $72$ $723$ $(100.0\%)$ $\mathbf{27± 44}$ $(99.8\%)$ $\mathbf{9± 4}$ $(100.0\%)$ $\mathbf{8± 1}$ $(100.0\%)$ - Samegame 2x3c3s2 $42$ $76$ $(100.0\%)$ $123± 197$ $(98.8\%)$ $\mathbf{7± 0}$ $(100.0\%)$ $\mathbf{7± 0}$ $(100.0\%)$ $1444± 541$ $(28.7\%)$ Samegame 5x5c3s2 $300$ $571$ $(32.1\%)$ $1003± 827$ $(30.5\%)$ $672± 160$ $(30.8\%)$ $527± 162$ $(30.2\%)$ $\mathbf{184± 107}$ $(4.9\%)$ Signpost 2x3 $72$ $776$ $(96.1\%)$ $838± 53$ $(97.2\%)$ $799± 13$ $(97.0\%)$ $859± 304$ $(91.3\%)$ $4883± 1285$ $(5.9\%)$ Singles 2x3de $36$ $353$ $(100.0\%)$ $\mathbf{7± 3}$ $(100.0\%)$ $\mathbf{7± 4}$ $(100.0\%)$ $\mathbf{11± 8}$ $(99.9\%)$ $733± 551$ $(28.4\%)$ Sixteen 2x3 $48$ $2908$ $(94.1\%)$ $2371± 1226$ $(55.7\%)$ $2968± 181$ $(92.8\%)$ $\mathbf{17± 1}$ $(100.0\%)$ $3281± 472$ $(68.7\%)$ Slant 2x2de $20$ $447$ $(100.0\%)$ $333± 190$ $(80.4\%)$ $21± 2$ $(99.9\%)$ $596± 163$ $(100.0\%)$ $1005± 665$ $(7.4\%)$ Solo 2x2 $144$ - - - - - Tents 4x4de $56$ $4442$ $(44.3\%)$ $4781± 86$ $(10.3\%)$ $4828± 752$ $(31.0\%)$ $3137± 581$ $(12.1\%)$ $4556± 3259$ $(0.6\%)$ Towers 3de $72$ $4876$ $(1.0\%)$ - $3789± 1288$ $(0.5\%)$ $3746± 1861$ $(0.5\%)$ - Tracks 4x4de $272$ $5213$ $(0.5\%)$ $4129± nan$ $(0.1\%)$ $5499± 2268$ $(0.3\%)$ $4483± 1513$ $(0.3\%)$ - Twiddle 2x3n2 $98$ $851$ $(100.0\%)$ $\mathbf{8± 1}$ $(99.9\%)$ $\mathbf{11± 7}$ $(100.0\%)$ $\mathbf{8± 0}$ $(100.0\%)$ $761± 860$ $(37.6\%)$ Undead 3x3de $63$ $4390$ $(40.1\%)$ $4542± 292$ $(5.7\%)$ $4179± 299$ $(31.0\%)$ $4088± 297$ $(35.8\%)$ $3677± 342$ $(9.0\%)$ Unequal 3de $63$ $4540$ $(6.7\%)$ - $5105± 193$ $(3.6\%)$ $2468± 2025$ $(4.8\%)$ $4944± 368$ $(7.2\%)$ Unruly 6x6dt $468$ - - - - - Untangle 4 $150$ $141$ $(100.0\%)$ $\mathbf{13± 1}$ $(100.0\%)$ $\mathbf{11± 0}$ $(100.0\%)$ $\mathbf{6± 0}$ $(100.0\%)$ $499± 636$ $(26.5\%)$ Untangle 6 $79$ $2165$ $(96.9\%)$ $2295± 66$ $(96.2\%)$ $2228± 126$ $(96.5\%)$ $1683± 74$ $(82.0\%)$ $2380± 0$ $(11.2\%)$ Summary - $217$ $1984$ $(71.2\%)$ $1604± 801$ $(61.6\%)(8)$ $1773± 639$ $(70.8\%)(11)$ $1334± 654$ $(62.7\%)(14)$ $1808± 983$ $(16.0\%)(5)$
Table 10: Continuation from Table 9. Listed below are the detailed results for all evaluated algorithms. Results show the average number of steps required for all successful episodes and standard deviation with respect to the random seeds. In brackets, we show the overall percentage of successful episodes. In the summary row, the last number in brackets denotes the total number of puzzles where a solution below the upper bound of optimal steps was found. Entries without values mean that no successful policy was found among all random seeds.
Puzzle Supplied Parameters Optimal Random A2C RecurrentPPO DQN QRDQN Blackbox w2h2m2M2 $144$ $2206$ $(99.2\%)$ $2524± 1193$ $(85.2\%)$ $2009± 427$ $(98.7\%)$ $2063± 70$ $(99.0\%)$ $2984± 1584$ $(76.8\%)$ Bridges 3x3 $378$ $547$ $(100.0\%)$ $540± 69$ $(100.0\%)$ $653± 165$ $(100.0\%)$ $549± 20$ $(100.0\%)$ $1504± 2037$ $(83.4\%)$ Cube c3x3 $54$ $4181$ $(66.9\%)$ $4516± 954$ $(17.5\%)$ $4943± 620$ $(16.2\%)$ $4407± 414$ $(43.4\%)$ $4241± 283$ $(26.4\%)$ Dominosa 1dt $32$ $1980$ $(99.2\%)$ $6408± nan$ $(0.2\%)$ $3009± 988$ $(80.6\%)$ $\mathbf{15± 6}$ $(100.0\%)$ $4457± 2183$ $(50.0\%)$ Fifteen 2x2 $256$ $54$ $(100.0\%)$ $\mathbf{4± 1}$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ Filling 2x3 $36$ $820$ $(100.0\%)$ $777± 310$ $(99.3\%)$ $764± 106$ $(100.0\%)$ $761± 109$ $(99.7\%)$ $2828± 2769$ $(63.2\%)$ Flip 3x3c $63$ $3138$ $(88.9\%)$ $4345± 1928$ $(29.4\%)$ $3356± 1412$ $(46.9\%)$ $3493± 129$ $(87.1\%)$ $3741± 353$ $(56.8\%)$ Flood 3x3c6m5 $63$ $134$ $(97.4\%)$ $406± 623$ $(93.4\%)$ $120± 17$ $(97.7\%)$ $128± 12$ $(90.8\%)$ $1954± 2309$ $(65.2\%)$ Galaxies 3x3de $156$ $4306$ $(33.9\%)$ $4586± 980$ $(10.8\%)$ $3939± 1438$ $(0.4\%)$ $4657± 147$ $(26.1\%)$ - Guess c2p3g10Bm $200$ $358$ $(73.4\%)$ - $323± 52$ $(44.6\%)$ $550± 248$ $(71.9\%)$ $3260± 2614$ $(34.4\%)$ Inertia 4x4 $51$ $13$ $(6.5\%)$ $105± 197$ $(6.1\%)$ $1198± 1482$ $(5.6\%)$ $179± 156$ $(7.1\%)$ $1330± 296$ $(5.8\%)$ Keen 3dem $63$ $3152$ $(0.5\%)$ - - $6774± 1046$ $(0.4\%)$ - Lightup 3x3b20s0d0 $35$ $2237$ $(98.1\%)$ $3034± 793$ $(62.7\%)$ $3493± 929$ $(66.5\%)$ $2429± 214$ $(97.5\%)$ $3440± 945$ $(57.8\%)$ Loopy 3x3t0de $4617$ - - - - - Magnets 3x3deS $72$ $1895$ $(99.1\%)$ $3057± 1114$ $(47.9\%)$ $1874± 222$ $(99.2\%)$ $2112± 331$ $(98.1\%)$ $5182± 3878$ $(33.8\%)$ Map 3x3n5de $70$ $903$ $(99.9\%)$ $2552± 1223$ $(52.5\%)$ $2608± 1808$ $(59.4\%)$ $949± 30$ $(99.9\%)$ $1753± 769$ $(78.1\%)$ Mines 4x4n2 $144$ $87$ $(18.1\%)$ $\mathbf{120± 41}$ $(14.7\%)$ $1189± 1341$ $(12.1\%)$ $207± 146$ $(17.6\%)$ $1576± 1051$ $(13.2\%)$ Mosaic 3x3 $63$ $4996$ $(9.8\%)$ $4937± 424$ $(8.4\%)$ $4907± 219$ $(8.3\%)$ $5279± 564$ $(7.0\%)$ $9490± 155$ $(0.0\%)$ Net 2x2 $28$ $1279$ $(100.0\%)$ $149± 288$ $(100.0\%)$ $1232± 92$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ $1793± 1663$ $(81.3\%)$ Netslide 2x3b1 $48$ $766$ $(100.0\%)$ $976± 584$ $(100.0\%)$ $2079± 1989$ $(64.7\%)$ $779± 37$ $(100.0\%)$ $1023± 206$ $(80.9\%)$ Netslide 3x3b1 $90$ $4671$ $(11.0\%)$ $4324± 657$ $(8.1\%)$ $2737± 1457$ $(1.7\%)$ $4099± 846$ $(5.1\%)$ $2025± 1475$ $(0.4\%)$ Palisade 2x3n3 $56$ $1428$ $(100.0\%)$ $1666± 198$ $(99.4\%)$ $1981± 1053$ $(92.5\%)$ $1445± 96$ $(99.9\%)$ $1519± 142$ $(99.8\%)$ Pattern 3x2 $36$ $3247$ $(92.9\%)$ $3445± 635$ $(82.9\%)$ $3733± 513$ $(79.7\%)$ $2809± 733$ $(89.7\%)$ $3406± 384$ $(51.1\%)$ Pearl 5x5de $300$ - - - - - Pegs 4x4Random $160$ - - - - - Range 3x3 $63$ $535$ $(100.0\%)$ $1438± 782$ $(81.4\%)$ $730± 172$ $(99.9\%)$ $594± 28$ $(100.0\%)$ - Rect 3x2 $72$ $723$ $(100.0\%)$ $3470± 2521$ $(17.6\%)$ $916± 420$ $(99.6\%)$ $511± 193$ $(97.4\%)$ $1560± 1553$ $(81.8\%)$ Samegame 2x3c3s2 $42$ $76$ $(100.0\%)$ $\mathbf{8± 1}$ $(100.0\%)$ $1777± 1643$ $(43.5\%)$ $\mathbf{8± 0}$ $(100.0\%)$ $\mathbf{14± 9}$ $(100.0\%)$ Samegame 5x5c3s2 $300$ $571$ $(32.1\%)$ $609± 155$ $(29.9\%)$ $1321± 1170$ $(30.3\%)$ $850± 546$ $(29.2\%)$ $5577± 1211$ $(12.8\%)$ Signpost 2x3 $72$ $776$ $(96.1\%)$ $2259± 1394$ $(85.9\%)$ $1000± 266$ $(77.9\%)$ $793± 17$ $(97.0\%)$ $2298± 2845$ $(78.0\%)$ Singles 2x3de $36$ $353$ $(100.0\%)$ $372± 47$ $(100.0\%)$ $331± 66$ $(100.0\%)$ $361± 47$ $(99.1\%)$ $392± 29$ $(100.0\%)$ Sixteen 2x3 $48$ $2908$ $(94.1\%)$ $3903± 479$ $(71.7\%)$ $3409± 574$ $(67.6\%)$ $2970± 107$ $(93.2\%)$ $4550± 848$ $(21.9\%)$ Slant 2x2de $20$ $447$ $(100.0\%)$ $984± 470$ $(99.8\%)$ $465± 34$ $(100.0\%)$ $496± 97$ $(100.0\%)$ $1398± 2097$ $(87.1\%)$ Solo 2x2 $144$ - - - - - Tents 4x4de $56$ $4442$ $(44.3\%)$ $6157± 1961$ $(2.1\%)$ $4980± 397$ $(12.8\%)$ $4515± 59$ $(38.1\%)$ $5295± 688$ $(7.8\%)$ Towers 3de $72$ $4876$ $(1.0\%)$ $9850± nan$ $(0.0\%)$ $8549± nan$ $(0.0\%)$ $5836± 776$ $(0.5\%)$ - Tracks 4x4de $272$ $5213$ $(0.5\%)$ $4501± nan$ $(0.0\%)$ - $5809± 661$ $(0.3\%)$ - Twiddle 2x3n2 $98$ $851$ $(100.0\%)$ $1248± 430$ $(99.6\%)$ $827± 71$ $(100.0\%)$ $\mathbf{83± 149}$ $(100.0\%)$ $3170± 1479$ $(33.4\%)$ Undead 3x3de $63$ $4390$ $(40.1\%)$ $5818± 154$ $(0.9\%)$ $5060± 2381$ $(0.5\%)$ - - Unequal 3de $63$ $4540$ $(6.7\%)$ $5067± 1600$ $(1.0\%)$ $5929± 1741$ $(1.1\%)$ $5057± 582$ $(5.6\%)$ - Unruly 6x6dt $468$ - - - - - Untangle 4 $150$ $141$ $(100.0\%)$ $1270± 1745$ $(90.4\%)$ $\mathbf{135± 18}$ $(100.0\%)$ $170± 29$ $(100.0\%)$ $871± 837$ $(99.0\%)$ Untangle 6 $79$ $2165$ $(96.9\%)$ $3324± 1165$ $(72.5\%)$ $2739± 588$ $(91.7\%)$ $2219± 84$ $(95.9\%)$ - Summary - $217$ $1984$ $(71.2\%)$ $2743± 954$ $(54.8\%)(3)$ $2342± 989$ $(61.1\%)(2)$ $1999± 365$ $(70.2\%)(5)$ $2754± 1579$ $(56.0\%)(2)$
Table 11: We list the detailed results for all the experiments of action masking and input representation. Results show the average number of steps required for all successful episodes and standard deviation with respect to the random seeds. In brackets, we show the overall percentage of successful episodes. In the summary row, the last number in brackets denotes the total number of puzzles where a solution below the upper bound of optimal steps was found. Entries without values mean that no successful policy was found among all random seeds.
Puzzle Supplied Parameters Optimal Random PPO (Internal State) PPO (RGB Pixels) MaskablePPO (Internal State) MaskablePPO (RGB Pixels) Blackbox w2h2m2M2 $144$ $2206$ $(99.2\%)$ $1773± 472$ $(59.5\%)$ $1509± 792$ $(97.9\%)$ $\mathbf{9± 0}$ $(99.7\%)$ $\mathbf{30± 1}$ $(99.2\%)$ Bridges 3x3 $378$ $547$ $(100.0\%)$ $682± 197$ $(85.1\%)$ $\mathbf{89± 176}$ $(99.1\%)$ $\mathbf{25± 0}$ $(99.4\%)$ $\mathbf{9± 0}$ $(99.6\%)$ Cube c3x3 $54$ $4181$ $(66.9\%)$ $744± 1610$ $(77.5\%)$ $3977± 442$ $(67.7\%)$ $\mathbf{16± 1}$ $(81.2\%)$ $410± 157$ $(75.1\%)$ Dominosa 1dt $32$ $1980$ $(99.2\%)$ $457± 954$ $(70.0\%)$ $539± 581$ $(100.0\%)$ $\mathbf{12± 0}$ $(100.0\%)$ $\mathbf{19± 2}$ $(100.0\%)$ Fifteen 2x2 $256$ $54$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ $\mathbf{37± 26}$ $(100.0\%)$ $\mathbf{4± 0}$ $(100.0\%)$ $\mathbf{3± 0}$ $(100.0\%)$ Filling 2x3 $36$ $820$ $(100.0\%)$ $290± 249$ $(97.5\%)$ $373± 175$ $(99.9\%)$ $\mathbf{7± 0}$ $(100.0\%)$ $\mathbf{34± 3}$ $(99.9\%)$ Flip 3x3c $63$ $3138$ $(88.9\%)$ $3008± 837$ $(40.1\%)$ $3616± 395$ $(78.3\%)$ $2174± 1423$ $(70.3\%)$ $319± 128$ $(81.3\%)$ Flood 3x3c6m5 $63$ $134$ $(97.4\%)$ $\mathbf{12± 0}$ $(99.9\%)$ $\mathbf{28± 12}$ $(99.7\%)$ $\mathbf{12± 0}$ $(99.9\%)$ $\mathbf{14± 0}$ $(99.9\%)$ Galaxies 3x3de $156$ $4306$ $(33.9\%)$ $3860± 1778$ $(8.3\%)$ $4439± 224$ $(29.1\%)$ $3640± 928$ $(40.2\%)$ $3372± 430$ $(40.5\%)$ Guess c2p3g10Bm $200$ $358$ $(73.4\%)$ - $344± 35$ $(72.0\%)$ $\mathbf{145± 19}$ $(75.4\%)$ - Inertia 4x4 $51$ $13$ $(6.5\%)$ $\mathbf{22± 9}$ $(6.3\%)$ $237± 10$ $(99.7\%)$ $\mathbf{41± 19}$ $(79.0\%)$ $169± 233$ $(69.8\%)$ Keen 3dem $63$ $3152$ $(0.5\%)$ $3817± 0$ $(0.2\%)$ - - - Lightup 3x3b20s0d0 $35$ $2237$ $(98.1\%)$ $1522± 1115$ $(82.7\%)$ $2401± 148$ $(97.5\%)$ $\mathbf{25± 8}$ $(99.1\%)$ $1608± 1144$ $(90.1\%)$ Loopy 3x3t0de $4617$ - - - - - Magnets 3x3deS $72$ $1895$ $(99.1\%)$ $1366± 1090$ $(90.2\%)$ $1794± 109$ $(98.7\%)$ $222± 33$ $(98.8\%)$ $425± 68$ $(99.2\%)$ Map 3x3n5de $70$ $903$ $(99.9\%)$ $1172± 297$ $(75.7\%)$ $958± 33$ $(99.9\%)$ $321± 33$ $(99.9\%)$ $467± 69$ $(99.1\%)$ Mines 4x4n2 $144$ $87$ $(18.1\%)$ $2478± 2424$ $(9.9\%)$ $2406± 296$ $(44.7\%)$ $412± 268$ $(43.3\%)$ $653± 396$ $(43.1\%)$ Mosaic 3x3 $63$ $4996$ $(9.8\%)$ $4928± 438$ $(2.5\%)$ $5673± 1547$ $(6.7\%)$ $3381± 906$ $(29.4\%)$ $3158± 247$ $(28.5\%)$ Net 2x2 $28$ $1279$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ $180± 44$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ - Netslide 2x3b1 $48$ $766$ $(100.0\%)$ $1612± 1229$ $(41.6\%)$ $\mathbf{35± 18}$ $(100.0\%)$ $\mathbf{13± 0}$ $(100.0\%)$ $96± 7$ $(100.0\%)$ Netslide 3x3b1 $90$ $4671$ $(11.0\%)$ $4671± 498$ $(9.2\%)$ - - - Palisade 2x3n3 $56$ $1428$ $(100.0\%)$ $939± 604$ $(87.0\%)$ $1412± 23$ $(99.9\%)$ $90± 55$ $(99.9\%)$ $347± 26$ $(99.8\%)$ Pattern 3x2 $36$ $3247$ $(92.9\%)$ $1542± 1262$ $(71.9\%)$ $2983± 173$ $(92.5\%)$ $\mathbf{14± 0}$ $(96.9\%)$ $1201± 1021$ $(88.7\%)$ Pearl 5x5de $300$ - - - - - Pegs 4x4Random $160$ - - - $1730± 579$ $(34.9\%)$ $1482± 687$ $(37.3\%)$ Range 3x3 $63$ $535$ $(100.0\%)$ $780± 305$ $(65.8\%)$ $613± 25$ $(100.0\%)$ $\mathbf{50± 69}$ $(100.0\%)$ $209± 26$ $(100.0\%)$ Rect 3x2 $72$ $723$ $(100.0\%)$ $\mathbf{27± 44}$ $(99.8\%)$ $300± 387$ $(100.0\%)$ $\mathbf{8± 0}$ $(100.0\%)$ $\mathbf{38± 9}$ $(100.0\%)$ Samegame 2x3c3s2 $42$ $76$ $(100.0\%)$ $123± 197$ $(98.8\%)$ $\mathbf{11± 8}$ $(100.0\%)$ $\mathbf{8± 0}$ $(100.0\%)$ $\mathbf{9± 0}$ $(100.0\%)$ Samegame 5x5c3s2 $300$ $571$ $(32.1\%)$ $1003± 827$ $(30.5\%)$ - - - Signpost 2x3 $72$ $776$ $(96.1\%)$ $838± 53$ $(97.2\%)$ $779± 50$ $(97.0\%)$ $567± 149$ $(97.7\%)$ $454± 50$ $(97.5\%)$ Singles 2x3de $36$ $353$ $(100.0\%)$ $\mathbf{7± 3}$ $(100.0\%)$ $306± 57$ $(100.0\%)$ $\mathbf{5± 1}$ $(100.0\%)$ $218± 17$ $(100.0\%)$ Sixteen 2x3 $48$ $2908$ $(94.1\%)$ $2371± 1226$ $(55.7\%)$ $3211± 450$ $(89.6\%)$ $\mathbf{19± 2}$ $(94.3\%)$ $3650± 190$ $(68.5\%)$ Slant 2x2de $20$ $447$ $(100.0\%)$ $333± 190$ $(80.4\%)$ $325± 119$ $(100.0\%)$ $\mathbf{12± 0}$ $(100.0\%)$ $89± 21$ $(100.0\%)$ Solo 2x2 $144$ - - - - - Tents 4x4de $56$ $4442$ $(44.3\%)$ $4781± 86$ $(10.3\%)$ $4493± 155$ $(37.5\%)$ $3485± 63$ $(39.9\%)$ $3485± 456$ $(45.0\%)$ Towers 3de $72$ $4876$ $(1.0\%)$ - - - - Tracks 4x4de $272$ $5213$ $(0.5\%)$ $4129± nan$ $(0.1\%)$ $4217± nan$ $(1.6\%)$ $5461± 976$ $(0.3\%)$ $5019± 2297$ $(0.4\%)$ Twiddle 2x3n2 $98$ $851$ $(100.0\%)$ $\mathbf{8± 1}$ $(99.9\%)$ $348± 466$ $(100.0\%)$ $\mathbf{7± 0}$ $(100.0\%)$ $\mathbf{12± 1}$ $(100.0\%)$ Undead 3x3de $63$ $4390$ $(40.1\%)$ $4542± 292$ $(5.7\%)$ $4129± 139$ $(40.0\%)$ $3415± 379$ $(42.8\%)$ $3482± 406$ $(46.1\%)$ Unequal 3de $63$ $4540$ $(6.7\%)$ - - $2322± 988$ $(38.7\%)$ $3021± 1368$ $(26.5\%)$ Unruly 6x6dt $468$ - - - - - Untangle 4 $150$ $141$ $(100.0\%)$ $\mathbf{13± 1}$ $(100.0\%)$ $\mathbf{35± 58}$ $(100.0\%)$ $\mathbf{12± 0}$ $(100.0\%)$ $\mathbf{7± 0}$ $(100.0\%)$ Untangle 6 $79$ $2165$ $(96.9\%)$ $2295± 66$ $(96.2\%)$ - - - Summary - $217$ $1984$ $(71.2\%)$ $1604± 801$ $(61.6\%)(8)$ $1619± 380$ $(82.8\%)(6)$ $814± 428$ $(81.2\%)(21)$ $1047± 583$ $(79.2\%)(10)$
E.5 Episode Length and Early Termination Parameters
In Table 12, the puzzles and parameters used for training the agents for the ablation in Section 3.4 are shown in combination with the results. Due to limited computational budget, we included only a subset of all puzzles at the easy human difficulty preset for DreamerV3. Namely, we have selected all puzzles where a random policy was able to complete at least a single episode successfully within 10,000 steps in 1000 evaluations. It contains a subset of the more challenging puzzles, as can be seen by the performance of many algorithms in Table 9. For some puzzles, e.g. Netslide, Samegame, Sixteen and Untangle, terminating episodes early brings a benefit in final evaluation performance when using a large maximal episode length during training. For the smaller maximal episode length, the difference is not always as pronounced.
Table 12: Listed below are the puzzles and their corresponding supplied parameters. For each setting, we report average success episode length with standard deviation with respect to the random seed, all averaged over all selected puzzles. In brackets, the percentage of successful episodes is reported.
| Bridges | 7x7i30e10m2d0 | $1e4$ | 10 | $4183.0± 2140.5$ (0.2%) |
| --- | --- | --- | --- | --- |
| - | - | | | |
| $1e5$ | 10 | $4017.9± 1390.1$ (0.3%) | | |
| - | $4396.2± 2517.2$ (0.3%) | | | |
| Cube | c4x4 | $1e4$ | 10 | $21.9± 1.4$ (100.0%) |
| - | $21.4± 0.9$ (100.0%) | | | |
| $1e5$ | 10 | $22.6± 2.0$ (100.0%) | | |
| - | $21.3± 1.2$ (100.0%) | | | |
| Flood | 12x12c6m5 | $1e4$ | 10 | - |
| - | - | | | |
| $1e5$ | 10 | - | | |
| - | - | | | |
| Guess | c6p4g10Bm | $1e4$ | 10 | - |
| - | $1060.4± 851.3$ (0.6%) | | | |
| $1e5$ | 10 | $2405.5± 2476.4$ (0.5%) | | |
| - | $3165.2± 1386.8$ (0.6%) | | | |
| Netslide | 3x3b1 | $1e4$ | 10 | $3820.3± 681.0$ (18.4%) |
| - | $3181.3± 485.5$ (21.1%) | | | |
| $1e5$ | 10 | $3624.9± 746.5$ (23.0%) | | |
| - | $4050.6± 505.5$ (10.6%) | | | |
| Samegame | 5x5c3s2 | $1e4$ | 10 | $53.8± 7.5$ (38.3%) |
| - | $717.4± 309.0$ (29.1%) | | | |
| $1e5$ | 10 | $47.3± 6.6$ (36.7%) | | |
| - | $1542.9± 824.0$ (26.4%) | | | |
| Signpost | 4x4c | $1e4$ | 10 | $6848.9± 677.7$ (1.1%) |
| - | $6861.8± 301.8$ (1.5%) | | | |
| $1e5$ | 10 | $6983.7± 392.4$ (1.6%) | | |
| - | - | | | |
| Sixteen | 3x3 | $1e4$ | 10 | $4770.5± 890.5$ (2.9%) |
| - | $4480.5± 2259.3$ (25.5%) | | | |
| $1e5$ | 10 | $3193.3± 2262.0$ (57.0%) | | |
| - | $3517.1± 1846.7$ (23.5%) | | | |
| Undead | 4x4de | $1e4$ | 10 | $5378.0± 1552.7$ (0.5%) |
| - | $5324.4± 557.9$ (0.6%) | | | |
| $1e5$ | 10 | $5666.2± 553.3$ (0.5%) | | |
| - | $5771.3± 2323.6$ (0.4%) | | | |
| Untangle | 6 | $1e4$ | 10 | $474.7± 117.6$ (99.1%) |
| - | $1491.9± 193.8$ (89.3%) | | | |
| $1e5$ | 10 | $597.0± 305.5$ (96.3%) | | |
| - | $1338.4± 283.6$ (88.7%) | | | |