# Articulated Animal AI: An Environment for Animal-like Cognition in a Limbed Agent
**Authors**:
- Jeremy Lucas (McGill University)
- Isabeau Prémont-Schwarz
## Abstract
This paper presents the Articulated Animal AI Environment for Animal Cognition, an enhanced version of the previous AnimalAI Environment. Key improvements include the addition of agent limbs, enabling more complex behaviors and interactions with the environment that closely resemble real animal movements. The testbench features an integrated curriculum training sequence and evaluation tools, eliminating the need for users to develop their own training programs. Additionally, the tests and training procedures are randomized, which will improve the agentâs generalization capabilities. These advancements significantly expand upon the original AnimalAI framework and will be used to evaluate agents on various aspects of animal cognition.
## 1 Introduction
The field of artificial intelligence has seen the development of numerous frameworks designed to test and evaluate models. Each new framework or dataset has often spurred on signmificant progress as different research teams compete to achieve the state the of the art on that specific benchmark. This paper specifically addresses AI frameworks that focus on tasks modeled after animal cognition. Despite significant advancements, many animals still outperform even the most advanced AI systems in various cognitive tasks. A notable framework within this domain is the AnimalAI Environment, which features a simplistic setup with a single spherical agent capable of only basic movements, such as moving forward or backward, and rotating left or right. This environment offers a set of simple building blocksâsuch as blocks, half-cylinders, spherical pieces of food, ramps, and wallsâwithin a built-in arena, allowing researchers to create their own cognitive tasks for the agent.
However, the AnimalAI Environment presents several limitations. The broad scope of the environment adds complexity to the task of training AI models, as researchers must first design effective tests. Additionally, the environment lacks generalization capabilities; object positions and rotations must be manually configured, leading to the potential for overfitting due to the absence of randomization in tests. Moreover, the agent itself is rudimentary and unrealistic, restricting the exploration of cognitive abilities related to movement. The Articulated Animal AI Environment has been developed to address these shortcomings, offering a more sophisticated and versatile platform for evaluating AI in the context of animal cognition tasks.
## 2 The Influence from Animal AI
The AnimalAI test environment serves as a versatile platform that enables scientists and researchers to design cognitive assessments for a simple agent using modular building blocks. Upon the publication of their paper, the AnimalAI team introduced a competition aimed at developing a highly generalizable agent capable of performing across a diverse array of unknown tests. This competition presented researchers with the challenge of generating a sufficient number of tests to train such an agent. The tests devised for the competition by the AnimalAI team were grounded in ten distinct categories of animal cognition. The results of the competition were disappointing, with even the best AI models performing significantly worse than a human child on every task apart from food retrieval, internal modeling, and numerosity. The meaning and definitions of these categories are described in the next section. Although the competition outcomes were underwhelming, the challenge itself was noteworthy. [1] [2]
## 3 Animal AIâs Cognition Benchmarks
We utilized the cognitive categories employed in the AnimalAI competition to develop our benchmark platform. These ten categories are as follows:
1. Basic food retrieval: This category tests the agentâs ability to reliably retrieve food in the presence of only food items. It is necessary to make sure agents have the same motivation as animals for subsequent tests.
1. Preferences: This category tests an agentâs ability to choose the most rewarding course of action. Tests are designed to be unambiguous as to the correct course of action based on the rewards in our environment.
1. Obstacles: This category contains objects that might impede the agentâs navigation. To succeed, the agent will have to explore its environment, a key component of animal behavior.
1. Avoidance: This category identifies an agentâs ability to detect and avoid negative stimuli, which is critical for biological organisms and important for subsequent tests.
1. Spatial Reasoning: This category tests an agentâs ability to understand the spatial affordances of its environment, including knowledge of simple physics and memory of previously visited locations.
1. Robustness: This category includes variations of the environment that look superficially different, but for which affordances and solutions to problems remain the same.
1. Internal Models: In these tests, the lights may turn off, and the agent must remember the layout of the environment to navigate in the dark.
1. Object Permanence and Working Memory: This category checks whether the agent understands that objects persist even when they are out of sight, as they do in the real world and in our environment.
1. Numerosity: This category tests the agentâs to judge which area has the most food, as it only gets to choose one area.
1. Causal Reasoning and Object Affordances: This category tests the agentâs ability to use objects to reach itâs goal.
## 4 Additional Categories
In addition to the 10 categories listed above, we introduced two additional categories, L0 - Initial Food Contact and L11 - Body Awareness, to address specific challenges posed by the multi-limbed agent within our environment. L0 was created to provide a simpler framework for the agent to learn to move toward food, recognizing the necessity for a foundational category focused on basic movement. L11 was developed in response to the absence of limb-related cognitive tests in the original AnimalAI competition, which featured a spherical, limbless agent. This new category was essential for evaluating the agentâs understanding and coordination of its limbs.
## 5 Environment Design
### 5.1 The Agent
<details>
<summary>extracted/5920856/Research/agent.png Details</summary>

### Visual Description
\n
## Diagram: Molecular Structure Representation
### Overview
The image depicts a 3D representation of a molecular structure. A central, larger sphere is surrounded by four smaller, elongated cylindrical structures arranged in a roughly tetrahedral configuration. The background is a dark blue grid. This appears to be a simplified model of a molecule, likely for educational or illustrative purposes.
### Components/Axes
There are no explicit axes or labels present in the image. The components are:
* **Central Atom:** A large, reddish-pink sphere.
* **Ligands/Bonding Atoms:** Four white, cylindrical structures extending outwards from the central atom.
* **Background:** A dark blue grid.
### Detailed Analysis or Content Details
The central atom is positioned approximately in the center of the image. The four cylindrical structures are arranged symmetrically around the central atom, extending horizontally and vertically. Each cylindrical structure appears to be connected to the central atom at a single point. The cylindrical structures are approximately equal in length. The central atom has a small, lighter-colored highlight on its upper surface. The background grid is regular and provides a sense of depth and spatial orientation.
### Key Observations
The arrangement of the four cylindrical structures around the central atom suggests a tetrahedral geometry. This is a common molecular geometry observed in molecules with four bonding pairs and no lone pairs of electrons around the central atom. The color contrast between the central atom and the ligands is significant, making the structure easily discernible.
### Interpretation
The diagram likely represents a molecule with a central atom bonded to four other atoms. Based on the tetrahedral geometry, it could be a molecule like methane (CHâ) or ammonium (NHââș). The simplified representation suggests that the diagram is intended to illustrate the basic shape and connectivity of the molecule, rather than providing detailed information about bond lengths, angles, or atomic identities. The absence of labels or a legend means the specific molecule cannot be determined from the image alone. The diagram is a visual aid for understanding molecular geometry and bonding. It is a conceptual model, not a precise depiction of atomic dimensions or electron distribution.
</details>
Figure 1: The Top Down view of the agent.
Description:
The agent is a soft body constructed using configurable joints. It has four thighs and four legs, totaling eight joints. Each joint can rotate in the x direction (up and down) between -90 and 90 degrees and in the z direction (sideways) between -45 and 45 degrees. The head has a mass of 0.5, while each thigh and leg have a mass of 1. The lighter weight of the head helps the agent walk more effectively, preventing it from dragging its head on the ground.
### 5.2 Observational Parameters
The vision space for the agent offers a choice between customizing a camera or using raycasts. By default, the agentâs joint rotations are normalized within the range of [-1, 1] and used as observations.
#### 5.2.1 Camera
The camera extends from the agentâs eye (indicated by a white dot) and can be set to grayscale. It supports resolutions ranging from 8x8 to 512x512 pixels.
#### 5.2.2 Raycast
Raycasts also emanate from the agentâs eye. The viewing angle and number of rays can be customized. The viewing angle ranges from 5 to 180 degrees, encompassing both the left and right sides of the agent. A viewing angle of 180 degrees covers the full 360-degree surroundings of the agent, while a 90-degree viewing angle covers the front half. The number of rays, ranging from 1 to 20, determines the number of rays emitted in each direction. There is always a ray pointing directly forward.
### 5.3 Action Parameters
#### 5.3.1 Joint Rotation
Selecting joint rotation enables the agent to move its joints by setting a target rotation property. The joints are propelled to reach the target rotation through motor control.
#### 5.3.2 Joint Velocity
Selecting joint velocity allows the agent to move its joints by setting a target angular velocity property. The joints are propelled by adding the specified angular velocity.
### 5.4 Rewards
There are four distinct types of rewards in these environments:
- Food Pieces: Green food pieces provide rewards equal to their scale or size. Yellow food pieces provide rewards equal to half of their scale or size.
- Wall Collisions: Agents receive a -1 reward when they collide with walls.
- Training Facilitation: Agents receive two rewards each timestep, both equal to -(0.5)/maxsteps. The first reward is received only if the head is on the ground, encouraging the agent to walk rather than drag its head. The second reward is given every timestep, incentivizing the agent to explore and make decisions.
### 5.5 Other Parameters
Maxsteps: This parameter defines the number of steps the agent has per episode. It is recommended to set this value between 1000 and 5000 steps.
### 5.6 The Environment
#### 5.6.1 Parameters
Difficulty: The difficulty parameter ranges from 0 to 10, increasing the challenge of the current level when generated. The specific aspects that become more difficult vary for each level and are described in the test bench.
Seed: The seed parameter allows for the saving of certain configurations for comparative testing. Setting a unique seed ensures that the level will generate and behave consistently every time.
### 5.7 The Test Bench
L0: L0 is the initial test designed to help the agent begin learning to approach and consume food. In this stage, the food is placed directly in front of the agent. As the difficulty level increases, the food is positioned farther away, becomes smaller, and varies in its horizontal placement relative to the agent.
<details>
<summary>extracted/5920856/Research/L0.png Details</summary>

### Visual Description
\n
## Screenshot: Reinforcement Learning Environment
### Overview
The image depicts a screenshot of a reinforcement learning environment. It shows a simulated arena with a red agent (likely a robot) and a green ball. The environment includes controls for adjusting the simulation speed and a status panel displaying various parameters. There is no chart or graph, but rather a visual representation of a state within a simulation.
### Components/Axes
The screenshot contains the following elements:
* **Arena:** A trapezoidal, grey-walled enclosure.
* **Agent:** A red, spherical object with four white appendages extending outwards, positioned near the bottom center of the arena.
* **Ball:** A green, spherical object positioned above the agent.
* **"Increase Speed" Button:** A green rectangular button in the top-left corner.
* **"Decrease Speed" Button:** A red rectangular button next to the "Increase Speed" button.
* **Status Panel:** A grey rectangular panel in the top-right corner displaying simulation parameters.
### Content Details
The Status Panel displays the following information:
* **Communicator:** Connected: False
* **Level:** L0 Test
* **Difficulty:** 2
* **Seed:** 711129056
* **Steps:** 311
* **Current Reward:** -0.0582
### Key Observations
The "Communicator" is disconnected. The simulation is at Level 0 with a difficulty of 2. The simulation has run for 311 steps, and the current reward is a negative value (-0.0582), suggesting the agent is not performing optimally. The seed value (711129056) indicates a specific initialization of the simulation.
### Interpretation
This screenshot represents a snapshot of a reinforcement learning agent attempting to interact with its environment. The negative reward suggests the agent is currently failing to achieve its goal (likely to reach or interact with the green ball). The disconnected communicator might indicate a lack of external control or monitoring. The low level and difficulty suggest this is an early stage of training. The seed value allows for reproducibility of the simulation. The visual setup suggests a task involving navigation and potentially manipulation of the ball by the agent. The agent's design with appendages could indicate it has some form of locomotion or manipulation capabilities. The overall setup is a common paradigm for testing and developing reinforcement learning algorithms.
</details>
L1: Basic Food Retrieval L1 serves as a benchmark to test the agentâs understanding of the food. The agent must recognize that yellow food pieces do not end the episode, while green food pieces do. As the difficulty increases, more pieces of food are distributed throughout the arena, ranging from 1 to 5 pieces. At lower difficulty levels, the food is mostly stationary, but it begins to move as the difficulty increases.
<details>
<summary>extracted/5920856/Research/L1.png Details</summary>

### Visual Description
\n
## Screenshot: Reinforcement Learning Environment
### Overview
The image depicts a screenshot of a simulated environment, likely used for reinforcement learning. It shows a rectangular arena with a central agent (red and white object) and several spherical objects (yellow and green) scattered within. The top-left corner features buttons to adjust the simulation speed, and the top-right corner displays simulation parameters.
### Components/Axes
The visible components are:
* **Arena:** A rectangular, grey-walled enclosure.
* **Agent:** A central object, primarily red with white extensions.
* **Spheres:** Yellow and green spherical objects distributed within the arena.
* **"Increase Speed" Button:** A green button.
* **"Decrease Speed" Button:** A red button.
* **Parameter Display:** A text block displaying simulation details.
The parameter display shows the following labels and values:
* **Communicator:** Connected: False
* **Level:** L1 Test
* **Difficulty:** 9
* **Seed:** 633604150
* **Steps:** 285
* **Current Reward:** -0.0569
### Detailed Analysis or Content Details
The arena appears to be a 3D rendered environment. The agent is positioned near the center of the arena, oriented diagonally. There are five yellow spheres and one green sphere visible. The spheres are of varying sizes, but appear to be roughly similar in diameter.
The "Increase Speed" button is located in the top-left corner of the image. The "Decrease Speed" button is positioned immediately to the right of it.
The parameter display, located in the top-right corner, provides the following information:
* The communicator is not connected (Connected: False).
* The current level is "L1 Test".
* The difficulty level is 9.
* The random seed used for the simulation is 633604150.
* The simulation has run for 285 steps.
* The current reward is -0.0569.
### Key Observations
* The negative reward suggests the agent is not performing optimally.
* The disconnected communicator indicates that the simulation is running in a standalone mode.
* The seed value allows for reproducibility of the simulation.
* The level is labeled as a "Test" level, suggesting it is a development or evaluation environment.
### Interpretation
This image represents a reinforcement learning environment where an agent is learning to interact with its surroundings. The agent's goal is likely to maximize its cumulative reward, which is currently negative. The yellow and green spheres could represent targets, obstacles, or resources. The agent's behavior is being tracked through the number of steps taken and the current reward received. The simulation parameters provide context for the learning process, such as the level of difficulty and the random seed used to initialize the environment. The disconnected communicator suggests that the agent is not receiving external guidance or feedback. The overall setup suggests a controlled environment for training and evaluating reinforcement learning algorithms. The negative reward indicates that the agent is still in the learning phase and has not yet discovered an optimal strategy.
</details>
L2: Preferences - Y-Maze L2 tests the agentâs ability to make choices to obtain better or more food. The Y-maze consists of two paths. When the difficulty is below 5, there will be only one piece of green food at one of the two paths. When the difficulty is above 5, there is an additional smaller piece of green food in the other path. The agent must learn to choose the optimal path.
<details>
<summary>extracted/5920856/Research/L2.png Details</summary>

### Visual Description
\n
## Screenshot: Game Environment
### Overview
The image is a screenshot of a 3D game environment, likely a reinforcement learning or robotics simulation. It depicts a simple maze-like structure with a robotic agent at the bottom and two green spheres positioned at the ends of the maze's branches. The top-left corner shows buttons to control the speed, and the top-right corner displays game parameters.
### Components/Axes
The screenshot contains the following elements:
* **Maze:** A white, Y-shaped maze structure.
* **Agent:** A small, red and blue robotic agent at the base of the maze.
* **Spheres:** Two green spheres, one at the end of each branch of the maze.
* **Speed Controls:** Two buttons labeled "Increase Speed" (green) and "Decrease Speed" (red).
* **Game Parameters:** A text block displaying game information.
### Content Details
The text block in the top-right corner provides the following information:
* **Communicator:** Connected: False
* **Level:** L2Y Test
* **Difficulty:** 8
* **Seed:** 1286935783
* **Steps:** 134
* **Current Reward:** -0.0228
The buttons in the top-left corner are labeled:
* "Increase Speed" (Green)
* "Decrease Speed" (Red)
### Key Observations
* The agent is positioned at the start of the maze.
* The "Communicator" is disconnected.
* The current reward is negative, suggesting the agent has not yet achieved a positive outcome.
* The level is labeled as a "Test" level.
* The seed value suggests a deterministic environment for reproducibility.
### Interpretation
The screenshot likely represents a reinforcement learning environment where an agent is learning to navigate a maze. The negative reward indicates the agent is currently performing poorly, and the disconnected communicator suggests no external guidance is being provided. The seed value allows for recreating the exact same environment and initial conditions for consistent training and evaluation. The "L2Y Test" level name suggests this is a level designed for testing or learning purposes. The presence of speed control buttons indicates that the agent's movement speed can be adjusted, potentially to influence learning or performance. The overall setup suggests a controlled environment for training an AI agent to solve a navigation task.
</details>
L2: Preferences - Delayed Gratification This test consists of a small piece of green food in front of the agent and a larger piece of yellow food on top of a pillar that is slowly descending. The agent must learn to wait for the yellow food to come down before consuming the green food. As the difficulty increases, the time it takes for the pillar to descend also increases.
<details>
<summary>extracted/5920856/Research/L2DG.png Details</summary>

### Visual Description
\n
## Screenshot: Simulated Environment
### Overview
The image depicts a screenshot of a simulated 3D environment, likely a physics or reinforcement learning testbed. The environment consists of a rectangular arena with several objects within it. There are UI elements present for controlling the simulation and displaying its status.
### Components/Axes
The screenshot contains the following UI elements and objects:
* **Top-Left:** Two rectangular buttons labeled "Increase Speed" (green) and "Decrease Speed" (red).
* **Top-Right:** A rectangular text box displaying simulation parameters.
* **Center:** A rectangular arena with a green floor.
* **Within Arena:**
* A yellow sphere atop a blue cylinder.
* A red object with multiple white appendages.
* A small green sphere.
The text box in the top-right displays the following information:
* **Communicator:** Connected: False
* **Level:** L2DG Test
* **Difficulty:** 3
* **Seed:** 657271791
* **Steps:** 146
* **Current Reward:** -0.0252
### Detailed Analysis or Content Details
The simulation appears to be in a state where the "Communicator" is not connected. The level is designated as "L2DG Test" with a difficulty level of 3. The simulation has run for 146 steps, and the current reward is -0.0252.
The arena is a rectangular prism with a green floor. The yellow sphere is positioned near the bottom-left of the arena, resting on top of a blue cylinder. The red object with white appendages is located towards the center of the arena. The small green sphere is positioned near the top-center of the arena.
The "Increase Speed" and "Decrease Speed" buttons suggest that the simulation speed can be adjusted by the user.
### Key Observations
The negative current reward (-0.0252) suggests that the simulation is not currently achieving a positive outcome. The disconnected communicator may indicate a networking issue or that the simulation is running in a standalone mode. The seed value (657271791) suggests that the simulation is deterministic, allowing for reproducibility.
### Interpretation
This screenshot likely represents a reinforcement learning environment where an agent (potentially controlled by the yellow sphere/cylinder combination) is attempting to achieve a goal within the arena. The red object and green sphere may represent obstacles or targets. The negative reward indicates that the agent is not yet performing optimally. The simulation parameters (level, difficulty, seed, steps) provide context for the current state of the learning process. The disconnected communicator suggests that the agent is not receiving external guidance or feedback. The UI elements allow for manual control of the simulation speed, which could be used for debugging or analysis.
The image does not provide any specific data points or trends in the traditional sense of a chart or graph. It is a snapshot of a simulation state, providing information about the environment, the agent, and the simulation parameters. The image is descriptive rather than analytical.
</details>
L3: Obstacles This test involves a single green piece of food placed in a random position and covered by a randomly sized transparent wall. Additionally, other walls will spawn in random orientations as obstacles. As the difficulty increases, both the size of the walls and the number of additional walls increase.
<details>
<summary>extracted/5920856/Research/L3.png Details</summary>

### Visual Description
\n
## Screenshot: Simulated Environment with Robot
### Overview
The image is a screenshot of a simulated 3D environment, likely a game or robotics simulation. It features a white, rectangular arena with a small, red and white robot positioned within it. The environment contains several translucent, grey rectangular obstacles and a brown wooden structure. A user interface is present in the top-left and top-right corners, providing controls and status information.
### Components/Axes
The screenshot contains the following UI elements and information:
* **Top-Left:** Two buttons:
* "Increase Speed" (Green)
* "Decrease Speed" (Red)
* **Top-Right:** A text block displaying simulation parameters:
* "Communicator"
* "Connected: False"
* "Level: L3 Test"
* "Difficulty: 7"
* "Seed: 551291670"
* "Steps: 128"
* "Current Reward: -0.0216"
* **Environment:**
* White rectangular arena with a slightly raised edge.
* Translucent grey rectangular obstacles.
* Brown wooden structure.
* Red and white robot.
### Detailed Analysis or Content Details
The simulation parameters provide specific values:
* **Connected:** False
* **Level:** L3 Test
* **Difficulty:** 7
* **Seed:** 551291670 (This is likely a random number used to initialize the simulation)
* **Steps:** 128 (Represents the number of simulation steps taken)
* **Current Reward:** -0.0216 (Indicates the robot's current reward value, which is negative)
The robot is positioned near the center of the arena, between the grey obstacles and the wooden structure. The arena floor appears to be a dark teal color.
### Key Observations
* The "Connected" status is "False", suggesting the simulation is running in a standalone mode.
* The "Current Reward" is negative, indicating the robot is not performing optimally or is being penalized.
* The "Steps" value of 128 suggests the simulation has been running for a moderate amount of time.
* The level is designated as "L3 Test", indicating a testing phase for level 3.
### Interpretation
This screenshot depicts a reinforcement learning or robotics simulation environment. The robot is likely learning to navigate the arena, avoid obstacles, and achieve a positive reward. The negative reward suggests the robot is currently failing to meet the simulation's objectives. The "Increase Speed" and "Decrease Speed" buttons allow a user to influence the robot's movement, potentially for testing or debugging purposes. The seed value ensures reproducibility of the simulation. The level designation and difficulty setting suggest a structured progression of challenges. The overall setup implies a controlled environment for training and evaluating an autonomous agent. The lack of connection suggests the simulation is running locally, without external communication or control.
</details>
L4: Avoidance This test involves creating holes in the ground to serve as a negative stimulus. While the previous AnimalAI paper used negative reward floor pads, holes provide a more realistic challenge. As the difficulty increases, the holes become larger. The test starts with two holes, and when the difficulty exceeds 5, a third hole is introduced.
<details>
<summary>extracted/5920856/Research/L4.png Details</summary>

### Visual Description
\n
## Screenshot: Reinforcement Learning Environment
### Overview
The image depicts a screenshot of a 3D reinforcement learning environment. It appears to be a simulated arena with a white rectangular boundary, a green spherical agent, and a more complex, multi-colored object (likely the target or goal). The environment includes several white rectangular obstacles. The top of the screen displays control buttons for adjusting speed, and the top-right corner shows a status panel with various parameters.
### Components/Axes
The screenshot contains the following elements:
* **Environment:** A 3D arena with a white floor and walls.
* **Agent:** A green sphere.
* **Target/Goal:** A complex object with red, white, and black components.
* **Obstacles:** White rectangular prisms scattered throughout the arena.
* **Control Buttons:** Two rectangular buttons labeled "Increase Speed" (green) and "Decrease Speed" (red). Located in the top-left corner.
* **Status Panel:** A rectangular panel in the top-right corner displaying the following information:
* Communicator: False
* Connected: False
* Level: L4 Test
* Difficulty: 10
* Seed: 641728343
* Steps: 184
* Current Reward: -0.0328
### Detailed Analysis or Content Details
The environment appears to be a simple obstacle course. The agent (green sphere) is positioned near the center of the arena, and the target (red/white/black object) is located slightly below and to the right of the agent. The obstacles are arranged in a non-uniform pattern, creating a challenging navigation task.
The status panel provides the following specific values:
* **Communicator:** False - Indicates no communication is active.
* **Connected:** False - Indicates the environment is not connected to an external system.
* **Level:** L4 Test - Indicates the current level is a test level labeled "L4".
* **Difficulty:** 10 - Indicates the difficulty level is set to 10.
* **Seed:** 641728343 - A random seed used for environment initialization.
* **Steps:** 184 - The number of steps taken in the current episode.
* **Current Reward:** -0.0328 - The reward received at the current step.
### Key Observations
* The negative reward suggests the agent is not performing optimally or is being penalized for its actions.
* The "Communicator" and "Connected" flags being set to "False" indicate a standalone simulation.
* The seed value allows for reproducibility of the environment.
* The level is a test level, suggesting it is used for evaluation or debugging.
### Interpretation
The screenshot represents a reinforcement learning environment designed for training an agent to navigate an obstacle course. The agent's goal is likely to reach the target object while avoiding collisions with the obstacles. The reward function is designed to incentivize successful navigation, with negative rewards potentially indicating collisions or inefficient paths. The status panel provides valuable information about the environment's configuration and the agent's progress. The negative current reward suggests the agent is still learning or is facing challenges in navigating the environment. The environment is likely part of a larger system for training and evaluating reinforcement learning algorithms. The level being a "Test" level suggests this is a controlled environment for assessing performance. The seed value is crucial for ensuring consistent results during experimentation.
</details>
L5: Spatial Reasoning This test evaluates the agentâs ability to navigate a maze to find a piece of food hidden behind a wall in a random sector. Additional walls are placed to obstruct the agentâs path. As the difficulty increases, the number and size of the walls increase, and the maze itself becomes larger and more complex.
<details>
<summary>extracted/5920856/Research/L5.png Details</summary>

### Visual Description
\n
## Diagram: Simulated Environment
### Overview
The image depicts a 3D simulated environment, resembling a maze or obstacle course. The view is an angled, overhead perspective. The environment is primarily white, with a teal-colored floor. There are several brown rectangular obstacles placed within the maze. A red and white spherical object with protruding arms is positioned at the bottom-center of the maze. The top-left corner displays two buttons, and the top-right corner shows a text box with status information.
### Components/Axes
The image contains the following components:
* **Environment:** A maze-like structure with white walls and a teal floor.
* **Obstacles:** Brown rectangular prisms scattered throughout the maze.
* **Agent:** A red and white spherical object with protruding arms.
* **Buttons:** "Increase Speed" (green) and "Decrease Speed" (red).
* **Status Box:** A text box displaying simulation parameters.
### Detailed Analysis or Content Details
The status box in the top-right corner contains the following information:
* **Communicator:** Connected: False
* **Level:** L5 Test
* **Difficulty:** 6
* **Seed:** 1132906214
* **Steps:** 188
* **Current Reward:** -0.043
The buttons in the top-left corner are labeled:
* "Increase Speed" (green button)
* "Decrease Speed" (red button)
The agent is located near the bottom-center of the maze. The maze itself is complex, with multiple pathways and dead ends. The obstacles are positioned to create challenges for the agent's navigation.
### Key Observations
* The "Communicator" is currently disconnected.
* The simulation is running at Level 5 with a difficulty of 6.
* The simulation has run for 188 steps, and the current reward is negative (-0.043).
* The agent's position suggests it may be at the beginning of the maze or has encountered difficulty.
### Interpretation
This image represents a reinforcement learning or AI simulation environment. The agent (red and white sphere) is likely learning to navigate the maze to maximize its reward. The negative reward suggests the agent has not yet found an optimal path or is being penalized for certain actions. The "Increase Speed" and "Decrease Speed" buttons allow for control over the simulation's pace, potentially for debugging or observation. The disconnected communicator suggests that the simulation is running locally or that communication with a remote server is not established. The seed value (1132906214) indicates a specific initialization of the simulation, allowing for reproducibility of results. The level and difficulty parameters suggest a progressive learning environment where the agent faces increasingly complex challenges. The maze design appears to be intentionally complex, requiring the agent to learn efficient pathfinding strategies. The overall setup suggests a controlled environment for testing and developing AI algorithms for navigation and decision-making.
</details>
L6: Robustness Robustness is inherently present in all these tests due to their randomized nature. However, enabling robustness specifically will result in a random test from L0 to L11 being selected, and the colors of the test objects changing to one of five randomly generated colors.
<details>
<summary>extracted/5920856/Research/L6.png Details</summary>

### Visual Description
\n
## Screenshot: Reinforcement Learning Environment
### Overview
The image depicts a screenshot of a 3D reinforcement learning environment. It appears to be a simulated arena with a central agent (a white, cross-shaped object) and several colored spheres. The environment includes controls for adjusting the agent's speed and a status panel displaying various parameters of the simulation.
### Components/Axes
The screenshot contains the following elements:
* **Arena:** A rectangular, pink-colored arena with raised, teal-colored walls.
* **Agent:** A white, cross-shaped object positioned near the center of the arena.
* **Spheres:** Three spheres of different colors: yellow, orange, and green.
* **Controls:** Two rectangular buttons labeled "Increase Speed" (green) and "Decrease Speed" (orange) located in the top-left corner.
* **Status Panel:** A rectangular panel in the top-right corner displaying simulation parameters.
### Content Details
The status panel displays the following information:
* **Communicator:** Connected: False
* **Level:** L6 Test
* **Difficulty:** 10 On
* **L1 Test:** (4)
* **Seed:** 392546972
* **Steps:** 153
* **Current Reward:** -0.1525
The spheres are positioned as follows (approximate relative to the agent):
* **Yellow Sphere:** Located below the agent.
* **Orange Sphere:** Located to the left of the agent.
* **Green Sphere:** Located to the right of the agent.
### Key Observations
The "Communicator" is disconnected. The simulation is at Level 6 with a difficulty of 10. The simulation has run for 153 steps and currently has a negative reward of -0.1525. The agent is positioned centrally within the arena, with three spheres distributed around it.
### Interpretation
This screenshot represents a snapshot of a reinforcement learning agent interacting with its environment. The negative reward suggests the agent is not yet performing optimally. The disconnected communicator might indicate a lack of external control or data logging. The level and difficulty settings suggest a progressive learning setup. The agent's central position and the distribution of spheres imply a task involving navigation and interaction with these objects, likely to maximize reward. The environment appears to be designed for testing and training an AI agent to learn a specific behavior within a constrained space. The seed value indicates the simulation is reproducible. The number of steps and current reward provide insight into the agent's learning progress.
</details>
L7: Internal Models This test assesses the agentâs internal modeling capabilities using its camera. Blackouts will occur, with the frequency and duration of these blackouts increasing as the difficulty rises. During these blackouts, the agentâs camera will render a dark screen. Enabling internal models will select a random test from L0 to L11 and apply the blackouts described above.
<details>
<summary>extracted/5920856/Research/L7.png Details</summary>

### Visual Description
\n
## Diagram: Simulated Environment - L7 Test
### Overview
The image depicts a simulated environment, likely a reinforcement learning or robotics test scenario. It shows a 3D rendered "Y" shaped track with a central agent and two goal locations. A panel on the right displays runtime parameters and status information. Two buttons are present in the top-left corner for speed control.
### Components/Axes
The image contains the following components:
* **Track:** A white, geometrically shaped track resembling a "Y".
* **Agent:** A small, complex object at the base of the "Y", likely representing the controlled entity.
* **Goals:** Two green spherical objects positioned at the end of each arm of the "Y".
* **Speed Control Buttons:** Two rectangular buttons labeled "Increase Speed" (green) and "Decrease Speed" (red).
* **Status Panel:** A rectangular panel displaying text-based information.
### Content Details
The status panel displays the following information:
* **Communicator:** Connected: False
* **Level:** L7 Test
* **Difficulty:** 2 On
* **L2Y Test:** (10)
* **Seed:** 331067481
* **Steps:** 247
* **Current Reward:** -0.054
The buttons are positioned in the top-left corner. The "Increase Speed" button is above and to the left of the "Decrease Speed" button.
### Key Observations
* The agent is positioned at the base of the "Y" track.
* The "Communicator" is currently disconnected.
* The "Current Reward" is negative, suggesting the agent has not yet achieved a positive outcome.
* The "Steps" counter indicates the agent has taken 247 steps.
* The seed value suggests a deterministic or reproducible simulation.
### Interpretation
This image represents a reinforcement learning environment where an agent is tasked with navigating a "Y" shaped track to reach one of two goal locations. The negative reward suggests the agent is still learning or has not yet found an optimal path. The disconnected communicator might indicate a lack of external control or monitoring. The L7 Test designation suggests this is a specific test level within a larger suite of tests. The seed value allows for the exact reproduction of this simulation run. The "L2Y Test (10)" could be a metric tracking performance on this specific level, with a value of 10. The difficulty level being "2 On" suggests a specific configuration of challenges within the environment. The agent's position at the base of the Y suggests it is at the start of a trial. The overall setup suggests an experiment focused on path planning and reward maximization.
</details>
L8: Object Permanence This test involves a green piece of food that spawns on either side of a wall facing the agent. The food then moves behind the wall. There is a barrier separating the two sides behind the wall, with one side containing the food and the other side containing a hole. The agent must remember that the food still exists behind the wall. As difficulty increases, the speed at which the food moves behind the wall also increases.
<details>
<summary>extracted/5920856/Research/L8.png Details</summary>

### Visual Description
\n
## Screenshot: Game Environment Display
### Overview
The image is a screenshot of a simulated environment, likely a reinforcement learning or game development setting. It displays a 3D arena with a robot-like agent and a target object. The top-left and top-right corners of the screen contain UI elements providing control and status information.
### Components/Axes
The UI elements consist of:
* **Buttons:** "Increase Speed" (green) and "Decrease Speed" (red).
* **Status Display:** A text block in the top-right corner displaying the following information:
* Communicator: Connected: False
* Level: L8 Test
* Difficulty: 3
* Seed: 966678271
* Steps: 87
* Current Reward: -0.0134
The 3D environment contains:
* A rectangular arena with gray walls and a green floor.
* A white rectangular obstacle in the center of the arena.
* A green spherical target object.
* A robot-like agent with a red body and a cross-shaped structure on top.
### Detailed Analysis or Content Details
The status display provides the following specific values:
* **Connected:** False
* **Level:** L8 Test
* **Difficulty:** 3
* **Seed:** 966678271
* **Steps:** 87
* **Current Reward:** -0.0134
The agent is positioned near the bottom-center of the arena, facing towards the target. The target is located near the center of the arena, behind the white obstacle. The agent appears to have a complex structure on top, possibly sensors or actuators.
### Key Observations
* The "Communicator" is disconnected.
* The current reward is negative, indicating the agent is not performing optimally.
* The simulation has run for 87 steps.
* The level is designated as "L8 Test", suggesting it's a specific test scenario.
* The seed value (966678271) indicates a specific random initialization for the simulation.
### Interpretation
The screenshot depicts a reinforcement learning environment where an agent is attempting to navigate an arena to reach a target. The negative reward suggests the agent is struggling to achieve its goal, possibly due to the obstacle or the current speed setting. The disconnected communicator might indicate a lack of external control or data transmission. The level and seed values provide context for the specific simulation run. The UI elements allow for manual adjustment of the agent's speed, potentially to aid in learning or debugging. The overall setup suggests a testing or training phase for the agent's navigation capabilities. The agent's complex structure hints at a sophisticated control system or sensor suite.
</details>
L9: Numerosity This test requires the agent to choose between different sectors based on the quantity of food present. There are four sectors, each containing a random number of food pieces. The agent must enter a sector and collect all the food within it. Once the agent enters a sector, the opening will close. The agent must correctly choose the sector with the most food. As the difficulty increases, the number of food pieces in each sector increases, making the decision more challenging.
<details>
<summary>extracted/5920856/Research/L9.png Details</summary>

### Visual Description
\n
## Screenshot: Reinforcement Learning Environment
### Overview
The image depicts a screenshot of a 3D reinforcement learning environment. It appears to be a simulated arena with obstacles and a central agent. The top-left corner shows controls for adjusting the simulation speed, and the top-right corner displays simulation parameters. The environment itself is a grid-like structure with colored cylinders representing obstacles or targets.
### Components/Axes
The screenshot contains the following elements:
* **Top-Left Controls:**
* "Increase Speed" button (Green)
* "Decrease Speed" button (Red)
* **Top-Right Parameters:**
* "Communicator" : False
* "Connected" : False
* "Level" : L9 Test
* "Difficulty" : 3
* "Seed" : 108839824
* "Steps" : 205
* "Current Reward" : -0.0370
* **Environment:** A square arena with a grid pattern.
* Green Cylinders: Scattered throughout the arena. Approximately 10 visible.
* Yellow Cylinders: Scattered throughout the arena. Approximately 6 visible.
* Blue Cylinders: Scattered throughout the arena. Approximately 4 visible.
* Red and White Agent: Located near the center of the arena.
### Detailed Analysis or Content Details
The simulation parameters provide quantitative information about the current state of the environment:
* **Communicator:** False - Indicates no external communication is active.
* **Connected:** False - Indicates the simulation is not connected to a network or external system.
* **Level:** L9 Test - Specifies the current level of the simulation, labeled as a "Test" level.
* **Difficulty:** 3 - Represents the difficulty setting of the simulation.
* **Seed:** 108839824 - A random seed used for initializing the simulation, ensuring reproducibility.
* **Steps:** 205 - The number of simulation steps that have been executed.
* **Current Reward:** -0.0370 - The current reward value accumulated by the agent. This is a negative value, suggesting the agent is not performing optimally.
The environment consists of a grid with three types of cylindrical obstacles/targets: green, yellow, and blue. The agent, represented by a red and white object, is positioned near the center of the grid. The agent's position appears to be slightly offset from the exact center.
### Key Observations
* The negative current reward suggests the agent is facing challenges in achieving its goal within the environment.
* The simulation is in a "Test" level, indicating it might be used for evaluating the agent's performance.
* The seed value allows for recreating the exact same simulation conditions.
* The environment is relatively sparse, with obstacles/targets distributed across the grid.
### Interpretation
This screenshot represents a reinforcement learning environment designed for training an agent to navigate and interact with its surroundings. The agent's goal is likely to collect rewards by interacting with the environment, potentially by reaching specific targets (the colored cylinders) or avoiding obstacles. The negative current reward indicates that the agent is currently not performing well, and further training is required. The parameters like level, difficulty, and seed provide control and reproducibility for the training process. The visual representation of the environment allows for a qualitative assessment of the agent's behavior and the challenges it faces. The lack of a "Communicator" and "Connected" status suggests this is a standalone simulation, not part of a distributed or networked learning system. The "L9 Test" level suggests this is a specific test case designed to evaluate the agent's performance at a particular stage of training.
</details>
L10: Causal Reasoning This test evaluates the agentâs cognitive ability to understand the environment around it. By pushing over a plank, the agent can walk over a trench to reach a piece of food. However, as the difficulty increases, transparent and opaque walls will appear alongside the plank. The agent must comprehend the cause and effect of its actions, and pick the plank to push over. Additionally, the trench will gradually become wider.
<details>
<summary>extracted/5920856/Research/L10.png Details</summary>

### Visual Description
\n
## Screenshot: Simulated Environment
### Overview
The image depicts a screenshot of a simulated 3D environment, likely a reinforcement learning or physics simulation. It features a rectangular arena with obstacles, a green sphere, and a red sphere with attached cylinders. The top-left corner shows buttons for speed control, and the top-right corner displays simulation parameters.
### Components/Axes
The image contains the following elements:
* **Arena:** A rectangular, white-bordered area with a light-green floor.
* **Obstacles:** Three white rectangular prisms positioned within the arena.
* **Green Sphere:** A small, green sphere located near the center of the arena.
* **Red Sphere with Cylinders:** A larger, red sphere with two attached cylinders, positioned in the bottom-right corner of the arena.
* **"Increase Speed" Button:** A green button in the top-left corner.
* **"Decrease Speed" Button:** A green button next to the "Increase Speed" button.
* **Simulation Parameters (Text Block):** A text block in the top-right corner displaying simulation information.
### Content Details
The text block in the top-right corner contains the following information:
* **Communicator:** Connected: False
* **Level:** L10 Test
* **Difficulty:** 9
* **Seed:** 974320329
* **Steps:** 64
* **Current Reward:** -0.5500
### Key Observations
* The simulation is not connected to a communicator.
* The current level is designated as "L10 Test".
* The simulation is running at a difficulty level of 9.
* The simulation has been running for 64 steps.
* The current reward is negative (-0.5500), suggesting the agent is not performing optimally.
* The presence of speed control buttons indicates the simulation speed can be adjusted.
### Interpretation
The image represents a snapshot of a reinforcement learning environment. The agent (likely controlled by the red sphere) is navigating an arena with obstacles. The negative reward suggests the agent is facing challenges in reaching a goal or completing a task. The "L10 Test" level and difficulty setting of 9 indicate a moderately challenging scenario. The disconnected communicator suggests the simulation is running in a standalone mode, without external communication or control. The seed value (974320329) allows for reproducibility of the simulation. The image provides a visual representation of the simulation state and key parameters, which are crucial for understanding the agent's behavior and performance. The arrangement of the obstacles suggests a navigation or path-planning task. The green sphere could be a target or a reward location.
</details>
L11: Body Awareness This test assesses the agentâs understanding of its own body. After navigating through the corridors of the Y-maze, Thorndike hut, and a normal maze, the agent has developed a basic understanding of its body. This test further evaluates the agentâs ability to function on rough terrain. A single piece of food is placed away from the agent on uneven terrain. As the difficulty increases, the terrain becomes progressively rougher.
<details>
<summary>extracted/5920856/Research/L11.png Details</summary>

### Visual Description
\n
## Screenshot: Simulation Environment
### Overview
The image depicts a screenshot of a 3D simulation environment, likely a physics or reinforcement learning testbed. The scene features a textured, undulating surface within a white rectangular container. Two spherical objects are visible on the surface. The top-left and top-right corners of the screen display UI elements providing control and status information.
### Components/Axes
The UI elements consist of:
* **Top-Left:** Two buttons labeled "Increase Speed" (green) and "Decrease Speed" (red).
* **Top-Right:** A text block displaying simulation parameters. The parameters are:
* Communicator: False
* Connected: False
* Level: L11 Test
* Difficulty: 6
* Seed: 1256656876
* Steps: 183
* Current Reward: -0.0327
The simulation environment itself contains:
* A white rectangular container.
* A textured, undulating surface within the container.
* A larger spherical object, colored with red, white, and grey.
* A smaller spherical object, colored green.
### Detailed Analysis or Content Details
The simulation parameters provide a snapshot of the current state. The "Communicator" and "Connected" flags are both set to "False", indicating the simulation is running in a standalone mode. The "Level" is set to "L11 Test", suggesting a specific test scenario. The "Difficulty" is set to 6. The "Seed" is 1256656876, which is likely used for reproducibility. The simulation has run for 183 "Steps", and the "Current Reward" is -0.0327.
The two spherical objects are positioned on the undulating surface. The larger object appears to be a molecular model, with red, white, and grey spheres representing atoms. The smaller object is a simple green sphere. The surface has a wave-like pattern, suggesting a fluid or deformable material.
### Key Observations
The negative "Current Reward" suggests the simulation is penalizing the agent (if any) for its actions. The "Steps" count indicates the simulation has been running for a moderate amount of time. The molecular model suggests the simulation might be related to chemistry, physics, or molecular dynamics.
### Interpretation
This image likely represents a reinforcement learning environment where an agent is tasked with manipulating the objects within the simulation. The undulating surface could represent a potential energy landscape, and the agent's goal might be to navigate the landscape to maximize its reward. The molecular model could be the object the agent is trying to manipulate, or it could be part of the environment itself. The negative reward suggests the agent is currently performing poorly, and the simulation is providing feedback to guide its learning process. The "Seed" value allows for the exact reproduction of this simulation state. The "Communicator" and "Connected" flags being false suggest this is a local, non-networked simulation. The level "L11 Test" indicates a specific test case within a larger set of levels. The difficulty level of 6 suggests a moderate challenge. The simulation is likely designed to test an agent's ability to navigate a complex environment and achieve a specific goal.
</details>
## 6 Curriculum Training
We provide a curriculum training inspired by [3] where the agent focuses on tasks were the learning progress (or regression) is fastest. The system operates by organizing the levels and difficulties into a 2D matrix, where each cell represents a specific level at a given difficulty. Upon the completion of an episode, the agentâs reward is recorded in the corresponding cell of the matrix. A rolling moving average and variance are calculated for each cell over a period of 10 episodes, and the z-value for each cell is determined using the following equation.
$$
z_{i}:=\frac{\sigma_{i}}{|\mu_{i}|+10^{-7}} \tag{1}
$$
These z-values are then used to create a probability distribution that guides the selection of the next training level and difficulty, prioritizing levels with higher variance. Thus the agent will focus more on tasks where itâs performance is highly variable. The probability of each cell being selected next is determined according to the equation below.
$$
p_{i}=\frac{z_{i}+c}{\sum_{j=1}^{n}(z_{j}+c)} \tag{2}
$$
This curriculum training procedure effectively promotes learning and facilitates training. As the agent begins to learn and increases its end reward for a task, the rolling variance rises, which in turn increases the z-value and the likelihood of the same level and difficulty being presented again. This creates a feedback loop until the agent either stops learning or completes the current level and difficulty. As the variance decreases, a new level and difficulty pair is presented to the agent for further learning. Additionally, this training combats catastrophic forgetting. If the agent revisits a previously mastered level and difficulty pair but is currently failing, the increased variance will cause this pair to reappear more frequently until the task is relearned. Overall, this curriculum training procedure provides an effective method for training an agent across a diverse range of tasks.
## 7 Future Work
In the initial release of the Articulated Animal AI Environment, our goal is to enable researchers to familiarize themselves with the platform and test their cognition-based models using the provided test bench. A designated set of testing seeds has been reserved within the environment, which will be used for future evaluations of selected candidates or potential competition scenarios.
## 8 Conclusion
The Articulated Animal AI Environment is a robust framework designed to assess the capabilities of a multi-limbed agent in performing animal cognition tasks. Building upon the foundational AnimalAI Environment, the Articulated Animal AI Environment enhances the training process by offering pre-randomized and generalized tests, coupled with a structured curriculum training program and comprehensive evaluation tools. This setup allows researchers to concentrate on refining their models rather than on the time-intensive task of developing effective tests.
## Acknowledgments
This work was supported in part by McGill University.
## References
- [1] Matthew Crosby, Benjamin Beyret, Murray Shanahan, José Hernåndez-Orallo, Lucy Cheke, and Marta Halina. Animal-ai: A testbed for experiments on autonomous agents. arXiv preprint arXiv:1909.07483, 2019.
- [2] Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, Matthew Crosby, Joel Holmes, John Burden, Niharika Chaubey, Niall Donnelly, Matishalin Patel, Marta Halina, JosĂ© HernĂĄndez-Orallo, and Lucy G. Cheke. Animal-ai 3: Whatâs new & why you should care. arXiv preprint arXiv:2312.11414, 2023.
- [3] Jacqueline Gottlieb, Pierre-Yves Oudeyer, Manuel Lopes, and Adrien Baranes. Information seeking, curiosity and attention: computational and neural mechanisms. Trends in Cognitive Sciences, 17(11):585â593, 2013.
## Appendix A Appendix
This appendix contains additional information about how to use the interface. All source code is available online.
Environment source code: https://github.com/jeremy-lucas-mcgill/Articulated-Animal-AI-Environment
## The Interface
The interface provides users with the capability to customize their agents by selecting specific actions, vision settings, and general parameters. Users can then choose the levels and difficulty range for training. The interface offers two primary options for initiating the training process: "Start Random," which randomly selects a level and difficulty for each episode, or "Start Curriculum," which begins the curriculum-based training program.
<details>
<summary>extracted/5920856/Research/Interface.png Details</summary>

### Visual Description
\n
## Screenshot: Neuro AI Testing Platform Interface
### Overview
The image is a screenshot of a user interface for a "Neuro AI Testing Platform". It appears to be a configuration panel allowing users to set parameters for training and evaluating artificial intelligence agents. The interface is divided into sections for Action Space, Vision Space, general settings (Max Steps, Reward), and Level Selection. It features checkboxes, input fields, and buttons for controlling the simulation environment.
### Components/Axes
The interface is structured into the following sections:
* **Header:** "Neuro AI Testing Platform"
* **Action Space:** Contains checkboxes for "Joint Rotation" and "Joint Angular Velocity".
* **Vision Space:** Contains checkboxes for "Camera Vision" and "Raycast". Sub-options include "Grayscale" (checkbox), "Resolution" (input field), "Viewing Angle" (input field), and "Number of Rays" (input field).
* **Max Steps:** Contains a checkbox for "Off Ground Reward" and a checkbox for "Camera Render".
* **Random Seed:** Contains a checkbox for "Random Seed" and an input field labeled "Seed:".
* **Train/Evaluate:** Contains checkboxes for "Train" and "Evaluate", and an input field labeled "Episodes:".
* **Buttons:** "START RANDOM" and "START CURRICULUM".
* **Level Selection:** A list of levels labeled "L0" through "L11", each with a checkbox. The levels are:
* L0 Initial Food Contact
* L1 Basic Food Retrieval
* L2 Y-Maze
* L2 Delayed Gratification
* L3 Obstacles
* L4 Avoidance
* L5 Spatial Reasoning
* L6 Robustness
* L7 Internal Models
* L8 Object Permanence
* L9 Numerosity
* L10 Causal Reasoning
* L11 Body Awareness
* **Difficulty:** A column of sliders next to the Level Selection list.
### Detailed Analysis or Content Details
The interface presents a series of configurable options.
* **Action Space:** Both "Joint Rotation" and "Joint Angular Velocity" are unchecked by default.
* **Vision Space:** "Camera Vision" is checked, while "Raycast" is unchecked. "Grayscale" is unchecked. The "Resolution", "Viewing Angle", and "Number of Rays" input fields are empty.
* **Max Steps:** "Off Ground Reward" is checked, and "Camera Render" is checked.
* **Random Seed:** "Random Seed" is checked. The "Seed:" input field is empty.
* **Train/Evaluate:** "Train" and "Evaluate" are checked. The "Episodes:" input field is empty.
* **Level Selection:** All levels (L0-L11) are unchecked by default. The difficulty sliders are all set to their minimum value.
* **Buttons:** The buttons "START RANDOM" and "START CURRICULUM" are present.
### Key Observations
The interface is designed for controlling a simulation environment. The options allow for customization of the agent's action space, sensory input (vision), training parameters, and the complexity of the environment (level selection). The presence of both "Train" and "Evaluate" checkboxes suggests the platform supports both training and testing of AI agents. The "START RANDOM" and "START CURRICULUM" buttons indicate different approaches to training â random exploration versus a pre-defined curriculum.
### Interpretation
This interface is likely part of a larger system for developing and testing reinforcement learning agents. The configurable options allow researchers to experiment with different settings and observe their impact on agent performance. The level selection provides a way to gradually increase the complexity of the task, potentially following a curriculum learning approach. The "Neuro AI Testing Platform" name suggests a focus on neural network-based AI. The interface is relatively simple and straightforward, suggesting it is intended for users with some familiarity with reinforcement learning concepts. The empty input fields indicate that the user has not yet configured the simulation. The checked boxes for "Camera Vision", "Off Ground Reward", "Random Seed", "Train", and "Evaluate" suggest a default configuration that utilizes camera input, provides a reward for staying off the ground, uses a random seed for reproducibility, and initiates both training and evaluation. The levels listed suggest a progression of tasks designed to test various aspects of intelligence, from basic food retrieval to more complex reasoning abilities like causality and body awareness.
</details>