## Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning
Jerin Paul Selvan Dept. of Computer Engineering Pune Institute of Computer Technology Pune, India jerinsprograms@gmail.com
Abstract -For over a decade now, robotics and the use of artificial agents have become a common thing. Testing the performance of new path finding or search space optimisation algorithms has also become a challenge as they require simulation or an environment to test them. The creation of artificial environments with artificial agents is one of the methods employed to test such algorithms. Games have also become an environment to test them. The performance of the algorithms can be compared by using artificial agents that will behave according to the algorithm in the environment they are put in. The performance parameters can be, how quickly the agent is able to differentiate between rewarding actions and hostile actions. This can be tested by placing the agent in an environment with different types of hurdles and the goal of the agent is to reach the farthest by taking decisions on actions that will lead to avoiding all the obstacles. The environment chosen is a game called 'Flappy Bird'. The goal of the game is to make the bird fly through a set of pipes of random heights. The bird must go in between these pipes and must not hit the top, the bottom, or the pipes themselves. The actions that the bird can take are either to flap its wings or drop down with gravity. The algorithms that are enforced on the artificial agents are NeuroEvolution of Augmenting Topologies (NEAT) and Reinforcement Learning. The NEAT algorithm takes an 'N' initial population of artificial agents. They follow genetic algorithms by considering an objective function, crossover, mutation, and augmenting topologies. Reinforcement learning, on the other hand, remembers the state, the action taken at that state, and the reward received for the action taken using a single agent and a Deep Q-learning Network. The performance of the NEAT algorithm improves as the initial population of the artificial agents is increased.
Keywords -NeuroEvolution of Augmenting Topologies (NEAT), Artificial agent, Artificial environment, Game, Reinforcement Learning (RL)
## I. INTRODUCTION
An intelligent agent is anything that can detect its surroundings, act independently to accomplish goals, and learn from experience or use knowledge to execute tasks better. The agent's surroundings are considered an environment in artificial intelligence. The agent uses actuators to send its output to the environment after receiving information from it through sensors. [11] There are several types of environments, Fully Observable vs Partially Observable, Deterministic vs Stochastic, Competitive vs Collaborative, Single-agent vs Multi-agent, Static vs Dynamic, Discrete vs Continuous,
Dr. P. S. Game Dept. of Computer Engineering Pune Institute of Computer Technology Pune, India psgame@pict.edu
Episodic vs Sequential and Known vs Unknown. An approach to machine learning known as NEAT, or Neuroevolution of Augmenting Topologies, functions similarly to evolution. In its most basic form, [1] NEAT is a technique for creating networks that are capable of performing a certain activity, like balancing a pole or operating a robot. It's significant that NEAT networks can learn using a reward function as opposed to back-propagation. By executing actions and observing the outcomes of those actions, an agent learns how to behave in a given environment via reinforcement learning, a feedbackbased machine learning technique. The agent receives compliments for each positive activity and is penalised or given negative feedback for each negative action. In contrast to supervised learning, reinforcement learning uses feedback to autonomously train the agent without the use of labelled data. The agent can only learn from its experience because there is no labelled data. In situations like gaming, robotics, and the like, where decisions must be made sequentially and with a long-term objective, RL provides a solution. The agent engages with the environment and independently explores it. In reinforcement learning, an agent's main objective is to maximise positive rewards while doing better.
## II. LITERATURE SURVEY
Games have been used a lot to act as an environment to test algorithms. There is a lot of research [3] done to create an AI bot that can challenge a player in a multi-player or two-player game. Neuroevolution and Reinforcement learning algorithms are some of the algorithms that are used to create AI bots or artificial agents. [1], [7] and [8] have implemented a configuration of an ANN called Neuroevolution. The algorithm does not depend on the actions taken by the agents as a whole. [3], [4], [5], [6] and [7] use Reinforcement Learning algorithm with Deep Q-Learning to train the agents.
The performance of the Neuroevolution algorithm depends on the objective function, initial population, mutation rate, weights and bias added to the network, the activation function used and overall topology of the network. Authors in [2] talk about how superior the Neuroevolution algorithm is over the traditional Reinforcement Learning algorithm with the Deep Q-Learning algorithm. Neuroevolution has an upper hand when it comes to the time taken by the artificial agent to train itself. There are other parameters that need to be taken into consideration while using a Neural Network. The topology of the network plays a vital role in the performance. Two strategies were proposed by Evgenia Papavasileiou (2021) [2], using fixed topologies in the neural networks and using augmented topologies. The network topology is a single hidden layer of neurons, with each hidden neuron connected to every network input and every network output. Evolution searches the space of connection weights of this fully-connected topology by allowing high performing networks to reproduce. The weight space is explored through the crossover of network weight vectors and through the mutation of single networks' weights. Thus, the goal of fixed-topology NE is to optimise the connection weights that determine the functionality of a network. The topology, or structure, of neural networks also affects their functionality, and modifying the network structure has been effective as part of supervised training.
There are two ways of making use of the environment. Authors in [3], [4], [6] and [7] use DNN to extract the features from the frame of the game and they form the input to the agent. However, [1], [5] and [8] make use of the game itself and place the agent to perceive its surroundings. There are several combinations of Reinforcement Learning algorithms possible, like Deep Neural Networks (DNN), Long short-term memory (LSTM), Deep Q-Network (DQN) and the like. However, depending on the type of obstacle and the type of game, its performance varies.
Reinforcement Learning algorithm with DNN and LSTM have been used in [3]. This algorithm addresses issues like vast search space, dependencies between the actions taken by the agent, the state and the environment, inputs and imperfect information. To reduce the complexity of the data generated by the perception of the agent, data skipping techniques are implemented. There is, however, a drawback with this algorithm. It takes a lot of time for the agent to train. Or, for every discrete step taken by the agent, it receives a state that belongs to a set S and it sends an action from the set A actions to the environment. The environment makes a transition from state St to St+1 and a gamma value [0, 1] determines the preference for immediate reward over longterm reward. A self-playing method is used by storing the parameters of the network to create a pool of past agents. This pool of past agents is used to sample opponents. This method offers RL to learn the Nash equilibrium strategy. Data skipping techniques were proposed in this paper. It refers to the process of dropping certain data during the training and evaluation process. Data skipping techniques proposed are: 'no-op' and 'maintain move decision'. The network is composed of an LSTM-based architecture, which has four heads with a shared state representation layer. An actor-critic off-policy learning algorithm was proposed.
Botong Liu (2020) [4] has used Reinforcement Learning with DQN. The game was split into frames, and each game image was sequentially scaled, grayed, and adjusted for brightness. Deep Q Network algorithm was used to convert the game decision problem into a classification and recognition prob- lem of multi-dimensional images and solve it using CNN. Reinforcement learning works best for continuous decisionmaking problems. However, Deep Reinforcement Learning has a limitation of not converging for which Neural fitted Q-learning and DQN algorithms were used to overcome the issue. Since FNQ can work with numerical information only the author suggests use of DQN. Combining Q learning with CNN, the DQN can achieve self-learning. ReLu and maximum pooling layers are added to the CNN. Gradient descent (Adam Optimizer) was used to train the DQN parameters.
Q-Value function based algorithms are the focus of Aidar Shakerimov (2021) [5]. For the DQN algorithms, improvements could be achieved in their performance by using a cumulative reward for training actions. To speed up training, RNN-ReLU was used instead of LSTM or GRU. LSTM or GRU performs better than RNN-ReLU but takes 7 times more time to train. Label smoothing was used to prevent the vanishing gradients in RNN-ReLU. However, DQN is sensitive to seed randomization.
SARSA is a slight variation of the traditional Q-Learning algorithm. Authors in [6] use SARSA and Q-Learning algorithms with modifications such as -greedy policy, discretization and backward updates. Some variants of Q-Learning were also implemented such as a tabular approach, Q-value approximation using linear regression, and NN. In the implementation, [6] finds the SARSA algorithm to have outperformed Q-learning. The specifications of the rewards are a positive 5 for passing a pipe, a negative 1000 for hitting a pipe, and a positive 0.5 for surviving a frame. Feed-forward NN was used with a 3 neuron input layer, 50 neuron hidden layer, 20 neuron hyphen layer, and a 2 neuron output layer (ReLU activation function). The CNN is used with preprocessed input image by removing the background, grayscale, and resizing to 80 x 80, 2 CNN layers were used, one using sixteen 5 × 5 kernels with stride 2, and another with thirty-two 5 × 5 kernels with stride 2.
[7] proposes the use of specific feature selection and presents the state by the bird velocity and the difference between the bird's position and the next lower pipe. This reduces the feature space and eliminates the need for deeper modules. The agent is provided with rational human-level inputs along with generic RL and a standard 3-layer NN with a genetic optimization algorithm. The reward for the agent is a positive 1 for every cross of the pipe and a negative 100 if the agent dies. The Neuro evolution has the following characteristics: the NN weights and the number of hidden layer units undergo changes, the mutation rate is kept at 0.3, and the initial population size is 200. [8] proposes the use of two levels for the Flappy Bird game. The fitness function is calculated by the distance traveled by the agent and the current distance to the closest gap. The mutation rate is kept at 0.2, and there are 5 neurons in the hidden layer.
## III. METHODOLOGY
The NEAT algorithm implementation is dependent on the objective function, crossover, mutation, and a population of agents. For a given position of the bird, say (x, y), there are two actions that the agent can make. Either the bird flaps its wings or it does not flap its wings. The vertical and horizontal distances traveled by the agent are determined by the following equations.
$$d _ { v e r t i c a l } = v _ { j u m p } . t + \frac { 1 } { 2 } . a . t ^ { 2 } \quad ( 1 )$$
$$d _ { h o r i z o n t a l } = v _ { f l o o r } . t \quad ( 2 )$$
$$d _ { f l o o r } = v _ { f l o o r } . t \quad ( 3 )$$
$$d _ { p i p e } = v _ { p i p e } . t \quad ( 4 )$$
Eq. (1) determines the vertical displacement of the agent, where a is the acceleration that is a constant [12]. As shown in
Fig. 1. Details of the game environment [10]
<details>
<summary>Image 1 Details</summary>

### Visual Description
\n
## Diagram: Flappy Bird Game Representation
### Overview
The image is a diagram representing the gameplay of the mobile game "Flappy Bird". It illustrates the key elements and parameters that define the game's environment and challenge. The diagram focuses on the arrangement of pipes, the bird's position, and the scoring system. It does not contain numerical data beyond the current score.
### Components/Axes
The diagram includes the following labeled components:
* **Gen: 1** - Located at the top-left corner, indicating the generation number (likely representing an iteration or level).
* **Score: 13** - Located at the top-right corner, displaying the current player score.
* **Random pipe height** - A label with an arrow pointing to the vertical distance between the top of the upper pipe and the bottom of the lower pipe.
* **Fixed gap** - A label with an arrow pointing to the vertical space between the upper and lower pipes.
* **Fixed distance between pipes** - A label with an arrow pointing to the horizontal distance between the two pipes.
* **Bird** - A small, cartoon-style bird in mid-flight.
* **Pipes** - Two green, cylindrical pipes with openings.
* **Ground** - A brown strip at the bottom of the image.
* **Background** - A light blue sky with a silhouette of a city at the bottom.
### Detailed Analysis or Content Details
The diagram illustrates the following spatial relationships:
* The bird is positioned between the two pipes, suggesting it is navigating through the gaps.
* The pipes are vertically aligned, with a consistent horizontal distance between them as indicated by the "Fixed distance between pipes" label.
* The height of the gap between the pipes is consistent, as indicated by the "Fixed gap" label.
* The vertical distance between the top of the upper pipe and the bottom of the lower pipe varies, as indicated by the "Random pipe height" label.
* The score is currently 13.
* The generation is 1.
### Key Observations
The diagram highlights the core mechanics of Flappy Bird:
* The game presents a series of obstacles (pipes) that the player must navigate.
* The obstacles are arranged in a predictable pattern (fixed distance between pipes, fixed gap), but with a random element (random pipe height).
* The player's score increases as they successfully navigate through the obstacles.
### Interpretation
The diagram serves as a simplified representation of the Flappy Bird game's environment. It emphasizes the procedural generation aspect of the game, where the pipe heights are randomized to create a challenging and unpredictable experience. The fixed gap and distance between pipes provide a consistent framework for the player to learn and adapt to. The diagram doesn't provide any information about the bird's movement or the physics of the game, but it effectively communicates the core gameplay loop and the key parameters that define the game's difficulty. The diagram is a visual aid for understanding the game's design and mechanics, rather than a presentation of quantitative data. It is a conceptual illustration, not a data visualization.
</details>
the Fig. 2, the y coordinate of the agent, the distance between the top pipe and the agent (y - T') and the distance between the bottom pipe and the agent (T') are the inputs to the neural network. The gap between the top and the bottom pipe is fixed to 320 pixels, and the heights are randomly generated. The distance between subsequent pipes is also kept constant. With respect to the NEAT algorithm, the fitness of the agent is determined by the number of pipes that the agent is able to cross without collision. As soon as the agent collides with the pipe, hits the roof, or falls down to the ground, the agent is removed from the environment. The performance of the algorithm depends on the initial population that is taken into consideration. The activation function used is the hyperbolic tangent function. The mutation rate is kept at 0.03. The encoding of the chromosome is shown in Table. I. The weight of the connection from a node in a layer to another node in the other layer and the dropped value is also considered as part of the encoding. If the connection is to be dropped, it
Fig. 2. Parameters required as input to the NN
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Diagram: Flappy Bird Game State with Distance Calculation
### Overview
The image depicts a screenshot from the Flappy Bird game. It illustrates the game state with the bird positioned between two sets of green pipes. Annotations are present to define a distance calculation, likely for game logic or AI training. The diagram focuses on the vertical distance between the bird and the lower pipe.
### Components/Axes
The image contains the following elements:
* **Game Elements:** Bird, upper pipes, lower pipes, ground, clouds, cityscape.
* **Text Labels:** "Gen: 1" (top-left), "Score: 13" (top-right), "T' = Lower\_Pipe\_Top - y" (bottom-center), "Y - T'" (left-center), "T'" (center), "(x, y)" (center).
* **Visual Annotations:** A vertical black arrow indicating the distance "Y - T'", and a shorter black line segment labeled "T'".
* **Coordinate System:** Implicitly defined by the (x, y) label, representing the bird's position.
### Detailed Analysis or Content Details
The diagram highlights the following relationships:
* **Bird Position:** The bird is located at coordinates (x, y).
* **Lower Pipe Top:** The top of the lower pipe is used as a reference point.
* **Distance T':** T' is defined as the difference between the y-coordinate of the top of the lower pipe and the bird's y-coordinate (T' = Lower\_Pipe\_Top - y). This represents the distance from the bird to the top of the lower pipe.
* **Distance Y - T':** This represents the distance from the bird to the bottom of the upper pipe.
The score is 13, and the generation is 1. These values are static text elements and do not contribute to the distance calculation.
### Key Observations
The diagram focuses on calculating the vertical distance between the bird and the nearest obstacle (the lower pipe). The formula T' = Lower\_Pipe\_Top - y suggests that a smaller T' value indicates the bird is closer to the lower pipe. The distance Y - T' is also calculated, which is the distance between the bird and the upper pipe.
### Interpretation
This diagram likely represents a simplified model used for analyzing the Flappy Bird game. The distance calculations (T' and Y - T') are crucial for determining the bird's proximity to obstacles. This information could be used for:
* **AI Training:** A reinforcement learning agent could use these distances as input features to learn optimal flapping strategies.
* **Game Logic:** The game itself uses these distances to detect collisions and determine game over conditions.
* **Path Planning:** Calculating these distances allows for a basic form of path planning, where the bird attempts to maintain a safe distance from both the upper and lower pipes.
The "Gen: 1" label suggests this might be an early stage in a genetic algorithm or evolutionary computation process, where the game is being simulated to evolve better playing strategies. The score of 13 provides a baseline performance metric. The diagram is not presenting data in a traditional chart or graph format, but rather illustrating a key calculation within the game's environment.
</details>
is encoded with the value 0 otherwise, it has the value 1. With reference to Fig. 3 and Table. I, the edges between
Fig. 3. Diagramatic view of the encoded chromosome in Table I
<details>
<summary>Image 3 Details</summary>

### Visual Description
\n
## Diagram: Directed Graph with Four Nodes
### Overview
The image depicts a directed graph consisting of four nodes, labeled 1 through 4. Arrows indicate the direction of relationships between these nodes. The diagram appears to represent a state transition diagram or a flow chart.
### Components/Axes
The diagram consists of:
* **Nodes:** Four circular nodes labeled 1, 2, 3, and 4.
* **Edges (Arrows):** Directed arrows connecting the nodes, indicating the flow or relationship between them.
### Detailed Analysis or Content Details
The following connections are present:
* Node 1 -> Node 2
* Node 1 -> Node 3
* Node 2 -> Node 3
* Node 2 -> Node 4
* Node 3 -> Node 4
* Node 4 -> Node 3
There are no labels on the edges, and no other textual information is present. The nodes are arranged roughly in a diamond shape. Node 1 is positioned on the left, Node 2 at the top, Node 3 at the bottom, and Node 4 on the right.
### Key Observations
The graph is strongly connected, meaning there is a path between any two nodes. Node 4 has a self-loop, pointing back to Node 3. Node 3 is a destination for multiple edges, suggesting it might be a common state or a sink.
### Interpretation
This diagram likely represents a system with four states (1, 2, 3, and 4) and defined transitions between them. The arrows indicate the possible state changes. The loop from Node 4 to Node 3 suggests a recurring process or a state that can return to a previous state. Without additional context, it's difficult to determine the specific meaning of the states and transitions. It could represent a finite state machine, a workflow, or a dependency graph. The absence of edge labels makes it impossible to understand the conditions or events that trigger the transitions.
</details>
TABLE I ENCODING OF A CHROMOSOME BEFORE CROSSOVER AND MUTATION
| Weight | 0.25 | 2.31 | 1.55 | 0.98 | 5.11 | 1.17 | 0.07 |
|----------|--------|--------|--------|--------|--------|--------|--------|
| From | 1 | 2 | 3 | 1 | 3 | 4 | 2 |
| To | 2 | 3 | 2 | 3 | 4 | 3 | 4 |
| Enabled | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
the nodes are represented by the rows 'From' and 'To'. The Table. I shows the encoding of the network before mutation. After the mutation, or rather after topology augmentation, the encoding of the edges is shown in Table. II. The resultant connections are shown in Fig. 4. The edges that are in red are the edges that were dropped, and the edges that are in green are the ones that have been added as a result of the mutation. The cross-over process happens between any two randomly selected parents. The next population is determined by the fitness of the individual agents.
Fig. 4. Diagramatic view of the encoded chromosome in Table. II
<details>
<summary>Image 4 Details</summary>

### Visual Description
\n
## Diagram: Directed Graph with Node Connections
### Overview
The image depicts a directed graph consisting of four nodes, numbered 1 through 4. The nodes are connected by directed edges (arrows) indicating the flow or relationship between them. The edges are colored red and green, potentially representing different types of connections or relationships.
### Components/Axes
The diagram consists of:
* **Nodes:** 1, 2, 3, and 4. Each node is represented by a circle containing its number.
* **Edges:** Directed arrows connecting the nodes.
* **Edge Colors:** Red and Green.
### Detailed Analysis or Content Details
The following connections are present in the graph:
* Node 1 -> Node 2 (Gray arrow)
* Node 1 -> Node 3 (Gray arrow)
* Node 2 -> Node 3 (Red arrow)
* Node 2 -> Node 4 (Red arrow)
* Node 3 -> Node 4 (Red arrow)
* Node 4 -> Node 2 (Green arrow)
* Node 4 -> Node 4 (Green arrow - self-loop)
### Key Observations
* Node 4 has both an outgoing edge to Node 2 and a self-loop.
* Node 2 is a source node, having only incoming edges.
* Node 1 is a source node, having only outgoing edges.
* Nodes 3 and 4 have both incoming and outgoing edges.
* The graph contains a cycle: 4 -> 2 -> 4.
### Interpretation
This diagram represents a system with directed relationships between four components (Nodes 1-4). The different colors of the edges (red and green) likely signify different types of relationships or dependencies. For example, red edges might represent a strong dependency or a critical path, while green edges could represent a feedback loop or a less critical connection. The self-loop on Node 4 suggests a self-regulating or iterative process within that component. The graph structure suggests a flow of information or control from Nodes 1 and 2 to Nodes 3 and 4, with a feedback loop from Node 4 to Node 2. Without further context, it's difficult to determine the specific meaning of the nodes and edges, but the diagram provides a visual representation of the relationships within the system.
</details>
TABLE II ENCODING OF A CHROMOSOME AFTER CROSSOVER AND MUTATION
| Weight | 0.25 | 5.11 | 1.17 | 0.98 | 2.31 | 1.55 | 0.07 |
|----------|--------|--------|--------|--------|--------|--------|--------|
| From | 1 | 2 | 4 | 1 | 3 | 3 | 4 |
| To | 3 | 4 | 2 | 4 | 2 | 4 | 3 |
| Enabled | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
## IV. RESULTS
The implementation of the algorithm requires no historic data or any dataset. The algorithm makes use of the sensory data perceived from the environment by the artificial agent as the program runs. The inputs to the algorithm are the y position of the agent, the vertical distance of the agent from the top pipe, and the vertical distance of the agent from the lower pipe. The output of the algorithm is the action that the agent is to take i.e. jump or drop down owing to gravity. NEAT algorithm was implemented by taking different initial populations. Fig. 5, Fig. 6 and Fig. 7 shows the average score and the scores reached in every generation, when the game is played by the agents over 50 generations. The change in the average scores
Fig. 5. Gameplay when initial population is 80
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Line Chart: Score vs. Generation
### Overview
The image presents a line chart illustrating the relationship between 'Generation' and 'Score'. Two data series are plotted: 'Score' (blue line) and 'Mean Score' (orange line). The chart appears to track the evolution of a score over 50 generations.
### Components/Axes
* **X-axis:** 'Generation', ranging from 0 to 50.
* **Y-axis:** 'Score', ranging from 0 to 800.
* **Legend:** Located in the top-right corner.
* 'Score' - Blue line
* 'Mean Score' - Orange line
### Detailed Analysis
The 'Score' line (blue) exhibits high volatility, fluctuating significantly across all generations. It starts near 0 at Generation 0, rises sharply to a peak of approximately 750 at Generation 7, then oscillates with large amplitude throughout the remaining generations. The 'Score' line ends at approximately 50 at Generation 50.
The 'Mean Score' line (orange) shows a generally increasing trend, but with less pronounced fluctuations than the 'Score' line. It begins at approximately 0 at Generation 0, gradually increases to around 280 at Generation 50.
Here's a breakdown of approximate data points (with uncertainty due to visual estimation):
| Generation | Score (approx.) | Mean Score (approx.) |
|---|---|---|
| 0 | 0 | 0 |
| 2 | 50 | 50 |
| 4 | 100 | 100 |
| 6 | 300 | 150 |
| 7 | 750 | 180 |
| 8 | 200 | 200 |
| 10 | 400 | 220 |
| 12 | 650 | 230 |
| 14 | 300 | 240 |
| 16 | 500 | 250 |
| 18 | 400 | 260 |
| 20 | 300 | 270 |
| 22 | 450 | 275 |
| 24 | 350 | 280 |
| 26 | 400 | 285 |
| 28 | 450 | 290 |
| 30 | 300 | 295 |
| 32 | 550 | 300 |
| 34 | 400 | 305 |
| 36 | 350 | 310 |
| 38 | 500 | 315 |
| 40 | 300 | 320 |
| 42 | 250 | 325 |
| 44 | 300 | 330 |
| 46 | 150 | 335 |
| 48 | 50 | 340 |
| 50 | 50 | 340 |
### Key Observations
* The 'Score' line demonstrates significant variance, indicating instability or sensitivity to changes in generation.
* The 'Mean Score' line provides a smoothed representation of the overall trend, suggesting a gradual increase in average score over generations.
* The gap between the 'Score' and 'Mean Score' lines is often substantial, highlighting the variability in individual scores.
* There are no clear cyclical patterns in the 'Score' line, although there are periods of relative stability followed by rapid fluctuations.
### Interpretation
The chart likely represents the performance of an algorithm or system over successive iterations (generations). The 'Score' could be a measure of fitness, accuracy, or some other performance metric. The high volatility of the 'Score' suggests that the system is sensitive to small changes or random events. The increasing 'Mean Score' indicates that, on average, the system is improving over time, despite the individual fluctuations.
The large difference between the 'Score' and 'Mean Score' could indicate that the system is prone to occasional breakthroughs or setbacks. The lack of clear cyclical patterns suggests that the system's behavior is not predictable in the short term. The chart suggests that while the system is generally improving, further optimization may be needed to reduce its variability and ensure more consistent performance. The data suggests a stochastic process with an upward drift.
</details>
over the change in the initial population is separately shown in
Fig. 6. Gameplay when initial population is 100
<details>
<summary>Image 6 Details</summary>

### Visual Description
\n
## Line Chart: Score vs. Generation
### Overview
This image presents a line chart illustrating the relationship between 'Generation' and 'Score', with a secondary line representing the 'Mean Score'. The chart appears to track the performance of a system or process over successive generations.
### Components/Axes
* **X-axis:** 'Generation', ranging from approximately 0 to 50.
* **Y-axis:** 'Score', ranging from approximately 0 to 800.
* **Legend:** Located in the top-left corner.
* 'Score' - Represented by a blue line.
* 'Mean Score' - Represented by an orange line.
### Detailed Analysis
The chart displays two distinct lines: 'Score' and 'Mean Score'.
**Score Line (Blue):**
The 'Score' line exhibits high volatility, fluctuating significantly across generations.
* At Generation 0, the Score is approximately 0.
* The Score rises rapidly to a peak of approximately 400 around Generation 7.
* It then declines to around 100 at Generation 10.
* The line continues to oscillate, reaching a maximum of approximately 550 at Generation 18, then dropping to around 150 at Generation 22.
* The Score continues to fluctuate, peaking again at approximately 850 around Generation 47, before falling to approximately 200 at Generation 50.
**Mean Score Line (Orange):**
The 'Mean Score' line is much smoother and less volatile than the 'Score' line.
* Starting at approximately 0 at Generation 0, the Mean Score gradually increases.
* It reaches a plateau of approximately 200 between Generations 20 and 40.
* Towards the end of the chart (Generations 40-50), the Mean Score shows a slight increase, reaching approximately 230 at Generation 50.
### Key Observations
* The 'Score' line demonstrates significant variance, indicating instability or sensitivity to changes in each generation.
* The 'Mean Score' line provides a smoothed representation of the overall trend, suggesting a gradual improvement in average performance over time.
* The gap between the 'Score' and 'Mean Score' lines is substantial, highlighting the variability in individual generation scores.
* The highest score observed is around 850 at Generation 47, while the lowest is near 0 at Generation 0.
### Interpretation
The chart likely represents the results of an evolutionary algorithm or a similar iterative process. The 'Score' line represents the performance of the best individual in each generation, while the 'Mean Score' line represents the average performance of the population. The high volatility of the 'Score' line suggests that the algorithm is exploring a complex search space, with occasional breakthroughs leading to high scores, followed by regressions. The gradual increase in the 'Mean Score' line indicates that the algorithm is, on average, making progress towards improving performance over time. The large gap between the two lines suggests that there is significant diversity within the population, with some individuals performing much better than others. The peak at Generation 47 could represent a particularly successful generation, but the subsequent decline suggests that this improvement may not be sustainable. The data suggests that while the algorithm is capable of achieving high scores, maintaining consistent performance remains a challenge.
</details>
Fig. 7. Gameplay when initial population is 120
<details>
<summary>Image 7 Details</summary>

### Visual Description
\n
## Line Chart: Score vs. Generation
### Overview
This image presents a line chart illustrating the relationship between 'Generation' and 'Score'. Two data series are plotted: 'Score' and 'Mean Score'. The chart appears to track the performance or progress of something over successive generations.
### Components/Axes
* **X-axis:** Labeled "Generation", ranging from approximately 0 to 50.
* **Y-axis:** Labeled "Score", ranging from 0 to 500.
* **Legend:** Located in the top-right corner.
* 'Score' - Represented by a blue line.
* 'Mean Score' - Represented by an orange line.
### Detailed Analysis
The 'Score' line (blue) exhibits high volatility, fluctuating significantly across all generations. It starts near 0 at Generation 0, rises sharply to a peak around 500 at Generation 7, then oscillates with large amplitude throughout the remaining generations.
The 'Mean Score' line (orange) shows a more stable trend. It begins at 0 at Generation 0, increases steadily to approximately 180 by Generation 15, and then plateaus, fluctuating around 180-240 for the remainder of the generations.
Here's a breakdown of approximate data points (with uncertainty due to visual estimation):
| Generation | Score (approx.) | Mean Score (approx.) |
|---|---|---|
| 0 | 0 | 0 |
| 2 | 50 | 50 |
| 4 | 150 | 100 |
| 6 | 400 | 140 |
| 7 | 500 | 160 |
| 8 | 200 | 170 |
| 10 | 400 | 180 |
| 15 | 200 | 180 |
| 20 | 50 | 190 |
| 25 | 400 | 210 |
| 30 | 450 | 200 |
| 35 | 300 | 220 |
| 40 | 350 | 210 |
| 45 | 450 | 230 |
| 50 | 200 | 240 |
### Key Observations
* The 'Score' line demonstrates substantial variance, indicating that individual performance varies greatly from generation to generation.
* The 'Mean Score' line shows a general upward trend initially, followed by stabilization. This suggests that while individual scores fluctuate, the average performance tends to improve and then level off.
* There are several instances where the 'Score' line significantly exceeds the 'Mean Score', indicating exceptional performance in specific generations.
* The 'Score' line frequently dips below the 'Mean Score', suggesting periods of poor performance.
### Interpretation
The chart likely represents the results of an evolutionary algorithm or a similar iterative process. The 'Generation' axis represents the iteration number, and the 'Score' represents the fitness or performance of an individual within that generation. The 'Mean Score' represents the average fitness of the population.
The initial increase in 'Mean Score' suggests that the algorithm is effectively improving the population's overall performance. The subsequent plateau indicates that the algorithm may have reached a local optimum or that further improvements are becoming increasingly difficult to achieve.
The high variance in the 'Score' line suggests that there is significant diversity within the population. This diversity is crucial for continued evolution, as it provides the raw material for the algorithm to explore new solutions. The occasional high-scoring individuals may represent promising new directions for the algorithm to pursue.
The chart suggests that the algorithm is functioning as expected, but that further optimization may require more sophisticated techniques to overcome the plateau in 'Mean Score'. It could also indicate the need for increased diversity or a different search strategy.
</details>
Fig. 8 for generations 30 to 50. The average score of the agent is steadily increasing from when the initial population is 20 to 100. The maximum score is observed when the population is 160. The average fitness value of the population is higher when the initial population size is 100. This is shown in Fig. 9. The initial training phase is less than 5 generations. When the initial population has fewer agents, it takes more generations to spike the average score of the game. This can be observed from Fig. 10. Table. III shows the average score and the maximum score gained by the agent over 50 generations. A maximum score of 1025 is obtained when the initial population is 160 and the gameplay run till 50 generations.
## CONCLUSION AND FUTURE SCOPE
By using a 2D game, the performance of the algorithms can be determined very efficiently. Unlike simulation, the creation of an environment gives better control over the environment. Through various iterations by changing the initial population size, the average score gained by the agent has increased. The initial population of agents also affects the training speed. The more the agents, the quicker the training is done. The highest
<details>
<summary>Image 8 Details</summary>

### Visual Description
## Line Chart: Fitness Value vs. Generation for Different Initial Population Sizes
### Overview
This line chart depicts the relationship between generation number and fitness value for six different initial population sizes in an evolutionary algorithm or similar optimization process. The x-axis represents the generation number, and the y-axis represents the fitness value. Each line represents a different initial population size.
### Components/Axes
* **X-axis:** Generation (ranging from approximately 32.5 to 50.0, with increments of 2.5)
* **Y-axis:** Fitness Value (ranging from 0 to 250, with increments of 50)
* **Legend:** Located in the bottom-left corner, listing the following initial population sizes and their corresponding line colors:
* Init Pop = 120 (Blue)
* Init Pop = 100 (Orange)
* Init Pop = 80 (Green)
* Init Pop = 60 (Purple)
* Init Pop = 40 (Medium Violet)
* Init Pop = 20 (Brown)
### Detailed Analysis
Here's a breakdown of each line's trend and approximate data points:
* **Init Pop = 120 (Blue):** The line starts at approximately 200, fluctuates slightly, and generally remains between 200 and 230. It shows a slight upward trend towards the end of the chart, reaching approximately 230 at generation 50.
* **Init Pop = 100 (Orange):** This line begins at approximately 190, increases to a peak of around 250 at generation 45, and then decreases slightly to approximately 240 at generation 50.
* **Init Pop = 80 (Green):** This line exhibits the highest fitness values throughout the chart. It starts at approximately 240, increases to a peak of around 270 at generation 37.5, and then fluctuates between 250 and 270, ending at approximately 260 at generation 50.
* **Init Pop = 60 (Purple):** The line starts at approximately 170, increases to around 190 at generation 40, and then plateaus, ending at approximately 185 at generation 50.
* **Init Pop = 40 (Medium Violet):** This line starts at approximately 160, increases steadily to around 200 at generation 45, and then plateaus, ending at approximately 200 at generation 50.
* **Init Pop = 20 (Brown):** This line shows the lowest fitness values. It starts at approximately 100, increases steadily to around 150 at generation 50.
### Key Observations
* Larger initial population sizes (80 and 100) generally lead to higher fitness values.
* The line for Init Pop = 80 consistently demonstrates the highest fitness values.
* The lines for Init Pop = 120, 60, and 40 show relatively stable fitness values after an initial period of fluctuation.
* The line for Init Pop = 20 exhibits the slowest rate of fitness improvement.
* The lines for Init Pop = 100 and 80 show a peak in fitness around generation 45 and 37.5 respectively, followed by a slight decline.
### Interpretation
The data suggests that the initial population size significantly impacts the performance of the optimization process. A larger initial population size appears to provide a more diverse starting point, leading to higher fitness values and faster convergence. However, there's a diminishing return; increasing the initial population from 80 to 100 doesn't yield a substantial improvement. The initial population of 20 consistently performs the worst, indicating that a small population size may limit the exploration of the search space. The peaks observed in the lines for Init Pop = 100 and 80 could indicate a point of local optima, where the algorithm gets stuck before potentially finding a better solution. The overall trend suggests that there is a sweet spot for the initial population size, where the benefits of diversity are maximized without excessive computational cost. The chart demonstrates the importance of parameter tuning in evolutionary algorithms and optimization processes.
</details>
Generation
Fig. 8. Average scores over initial population change (Gen 30 - Gen 50)
<details>
<summary>Image 9 Details</summary>

### Visual Description
\n
## Line Chart: Fitness Value vs. Generation
### Overview
The image presents a line chart illustrating the relationship between 'Generation' and 'Fitness Value' for two different initial population sizes. The chart compares the evolution of fitness over generations for an initial population of 120 and an initial population of 100.
### Components/Axes
* **X-axis:** 'Generation', ranging from 0 to 50. The axis is labeled at intervals of 10.
* **Y-axis:** 'Fitness value', ranging from 0 to 320. The axis is labeled at intervals of 50.
* **Legend:** Located in the top-left corner.
* 'Init Pop = 120' - Represented by a blue line.
* 'Init Pop = 100' - Represented by an orange line.
### Detailed Analysis
**Line 1: Init Pop = 120 (Blue)**
The blue line representing 'Init Pop = 120' starts at approximately 0 at Generation 0. It exhibits a generally upward trend, with several plateaus and dips.
* At Generation 5, the fitness value is approximately 60.
* At Generation 10, the fitness value is approximately 110.
* At Generation 15, the fitness value is approximately 160.
* At Generation 20, the fitness value is approximately 190.
* At Generation 25, the fitness value is approximately 240.
* At Generation 30, the fitness value is approximately 260, followed by a dip to around 240.
* At Generation 35, the fitness value is approximately 270.
* At Generation 40, the fitness value is approximately 290.
* At Generation 45, the fitness value is approximately 310.
* At Generation 50, the fitness value is approximately 320.
**Line 2: Init Pop = 100 (Orange)**
The orange line representing 'Init Pop = 100' also starts at approximately 0 at Generation 0 and shows an overall upward trend. It appears to be smoother than the blue line, with fewer sharp dips.
* At Generation 5, the fitness value is approximately 50.
* At Generation 10, the fitness value is approximately 100.
* At Generation 15, the fitness value is approximately 150.
* At Generation 20, the fitness value is approximately 190.
* At Generation 25, the fitness value is approximately 250.
* At Generation 30, the fitness value is approximately 280.
* At Generation 35, the fitness value is approximately 300.
* At Generation 40, the fitness value is approximately 310.
* At Generation 45, the fitness value is approximately 320.
* At Generation 50, the fitness value is approximately 330.
### Key Observations
* Both initial population sizes demonstrate increasing fitness values over generations, indicating successful evolution.
* The 'Init Pop = 100' line generally exhibits a higher fitness value than the 'Init Pop = 120' line, particularly after Generation 30.
* The 'Init Pop = 120' line shows more volatility in fitness, with more pronounced dips and plateaus, suggesting a potentially more complex evolutionary path.
### Interpretation
The chart suggests that a smaller initial population size (100) may lead to faster or more consistent fitness improvement compared to a larger initial population size (120) in this specific evolutionary process. The greater volatility observed in the 'Init Pop = 120' line could indicate a higher degree of exploration in the search space, potentially leading to both higher peaks and deeper valleys in fitness. The data implies that the initial population size influences the evolutionary trajectory and the rate of adaptation. The difference in final fitness values suggests that the initial population size is a significant parameter in the optimization process. The chart demonstrates a typical evolutionary trend where fitness increases over generations, but the specific path and final outcome are sensitive to initial conditions.
</details>
Fig. 9. Average Fitness of the population over initial population change
<details>
<summary>Image 10 Details</summary>

### Visual Description
\n
## Line Chart: Score vs. Generation for Different Initial Populations
### Overview
The image presents a line chart illustrating the relationship between 'Score' and 'Generation' for three different 'Init Pop' (Initial Population) values: 20, 40, and 60. The chart displays how the score evolves over generations for each initial population size.
### Components/Axes
* **X-axis:** 'Generation', ranging from approximately 0 to 50.
* **Y-axis:** 'Score', ranging from approximately 0 to 200.
* **Legend:** Located in the top-right corner, identifying the three lines:
* Blue line: 'Init Pop = 20'
* Orange line: 'Init Pop = 40'
* Green line: 'Init Pop = 60'
### Detailed Analysis
The chart shows three distinct lines representing the score progression for each initial population size.
* **Init Pop = 20 (Blue Line):** This line starts at approximately 0 at Generation 0, rapidly increases to around 75 by Generation 10, then plateaus and continues to increase more slowly, reaching approximately 150 at Generation 50. The line exhibits several small fluctuations throughout the generations.
* **Init Pop = 40 (Orange Line):** This line also starts at approximately 0 at Generation 0, but increases more rapidly than the blue line, reaching around 120 by Generation 10. It then experiences fluctuations, reaching a peak of approximately 180 around Generation 25, and then decreases slightly before stabilizing around 175 at Generation 50.
* **Init Pop = 60 (Green Line):** This line shows the fastest initial increase, reaching approximately 150 by Generation 5. It then fluctuates significantly, reaching a maximum score of approximately 200 around Generation 15, and then stabilizes around 190-200 for the remainder of the generations.
### Key Observations
* Higher initial population sizes (40 and 60) generally lead to higher scores compared to a smaller initial population size (20).
* The green line (Init Pop = 60) consistently achieves the highest scores throughout the generations.
* All three lines exhibit fluctuations, suggesting that the score is not a strictly increasing function of generation.
* The rate of score increase diminishes over time for all initial population sizes.
### Interpretation
The data suggests that increasing the initial population size positively impacts the score achieved over generations. This could be due to a larger initial population providing more diversity, which allows for more effective exploration of the solution space. However, the fluctuations observed in all lines indicate that the process is not deterministic and that random factors play a role. The diminishing rate of score increase over time suggests that the system is approaching a point of diminishing returns, where further generations yield smaller improvements in score. The green line's consistent high performance suggests that an initial population of 60 provides a good balance between diversity and efficiency. The chart demonstrates a typical evolutionary process where initial diversity leads to rapid improvement, followed by stabilization as the population converges towards optimal solutions.
</details>
Fig. 10. Speed of agents getting trained over initial population change
| Initial Population | Average Score | Max Score |
|----------------------|-----------------|-------------|
| 20 | 158.2 | 583 |
| 40 | 187.04 | 771 |
| 60 | 200.06 | 756 |
| 80 | 250.28 | 765 |
| 100 | 244.56 | 911 |
| 120 | 220.66 | 544 |
| 140 | 255.82 | 565 |
| 160 | 293.72 | 1025 |
average score is obtained when the initial population is set to 100 individuals. It can be concluded that the performance of the algorithm increases as the initial population is increased. The implemented algorithm can be extended to making use of Reinforcement Learning with multiple agents and using Augmented Topologies along with the Deep Q-Learning model.
## REFERENCES
- [1] M. G. Cordeiro, Paulo B. S. Serafim, Yuri B. Nogueira, 'A Minimal Training Strategy to Play Flappy Bird Indefinitely with NEAT', 2019 18th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 384-390, DOI: 10.1109/SBGames.2019.00014.
- [2] Evgenia Papavasileiou, Jan Cornelis and Bart Jansen, 'A Systematic Literature Review of the Successors of 'NeuroEvolution of Augmenting Topolo- gies', Evolutionary Computation, vol: 29, Issue: 1, March 2021, pp. 1-73, DOI: 10.1162/evcoa00282.
- [3] Inseok Oh, Seungeun Rho, Sangbin Moon and Seongho Son, 'Creating Pro-Level AI for a Real-Time Fighting Game Using Deep Reinforcement Learning', IEEE Transactions on Games, 2021, pp. 1-10, doi: 10.1109/TG.2021.3049539.
- [4] Botong Liu, 'Implementing Game Strategies Based on Reinforcement Learning', ICRAI 2020: 2020 6th International Conference on Robotics and Artificial Intelligence, November 2020, pp 53-56, DOI: https://doi.org/10.1145/3449301.3449311.
- [5] Evalds Urtans, Agris Nikitenko, 'Survey of Deep Q-Network variants in PyGame Learning Environment', ICDLT '18: Proceedings of the 2018 2nd International Conference on Deep Learning Technologies, June 2018, pp 27-36, DOI: https://doi.org/10.1145/3234804.3234816.
- [6] Tai Vu, Leon Tran, 'FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning Techniques', experimental projects with community collaborators, 2020, DOI: https://doi.org/10.48550/arXiv.2003.09579.
- [7] Andre Brandao, Pedro Pires, Petia Georgieva, 'Reinforcement Learning and Neuroevolution in Flappy Bird Game', Pattern Recognition and Image Analysis, 2019, pp.225-236, DOI: DOI:10.1007/978-3-030-31332620.
- [8] Yash Mishra, Vijay Kumawat, Selvakumar Kamalanathan, 'Performance Analysis of Flappy Bird Playing Agent Using Neural Network and Genetic Algorithm', Information, Communication and Computing Technology, 2019, pp.253-265, DOI:10.1007/978-981-15-1384-821.
- [9] A. McIntyre, M. Kallada, C. G. Miguel, and C. F. da Silva, 'neatpython,' https://github.com/CodeReclaimers/neat-python.
- [10] T. Ruscica and J. Keromnes, 'Flappy Bird game images', https://github.com/techwithtim/NEAT-Flappy-Bird/tree/master/imgs.
- [11] S. Kumar, 'Understand Types of Environments in Artificial Intelligence', https://www.aitude.com/understand-types-of-environments-inartificial-intelligence/, 2020.
- [12] D. Zhu, 'How I Built an Intelligent Agent to Play Flappy Bird', Analytics Vidhya, 2020.