# Mathematical Runtime Analysis for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)††thanks: Extended version of a paper that appeared at AAAI 2022 [ZLD22], and that was conducted when the first author was with Southern University of Science and Technology. This version contains all proofs, many of them revised and improved. In particular, the runtime result for the NSGA-II with tournament selection on LeadingOnesTrailingZeroes now holds for N≥4(n+1)𝑁4𝑛1N\geq 4(n+1)italic_N ≥ 4 ( italic_n + 1 ) instead of N≥5(n+1)𝑁5𝑛1N\geq 5(n+1)italic_N ≥ 5 ( italic_n + 1 ). In addition, in all upper bounds we now also regard binary tournament selection as in Deb’s implementation of the NSGA-II (building the N𝑁Nitalic_N tournaments from two permutations of the population). We added tail bounds for the runtime guarantees. The experimental section was extended as well.
**Authors**:
- Weijie Zheng (International Research Institute for Artificial Intelligence)
- Shenzhen, China
- Benjamin Doerr
- Laboratoire d’Informatique (LIX)
- École Polytechnique, CNRS (Institute Polytechnique de Paris)
- Palaiseau, France
> Corresponding author.
## Abstract
The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeroes benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.
## 1 Introduction
Many real-world problems need to optimize multiple conflicting objectives simultaneously, see [ZQL ${}^+$ 11] for a discussion of the different areas in which such problems arise. Instead of computing a single good solution, a common approach to such multi-objective optimization problems is to compute a set of interesting solutions so that a decision maker can select the most desirable one from these. Multi-objective evolutionary algorithms (MOEAs) are a natural choice for such problems due to their population-based nature. Such multi-objective evolutionary algorithms (MOEAs) have been successfully used in many real-world applications [ZQL ${}^+$ 11].
Unfortunately, the theoretical understanding of MOEAs falls far behind their success in practice, and this discrepancy is even larger than in single-objective evolutionary computation, where the last twenty years have seen some noteworthy progress on the theory side [NW10, AD11, Jan13, DN20]. After some early theoretical works on convergence properties, e.g., [Rud98], the first mathematical runtime analysis of an MOEA was conducted by Laumanns et al. [LTZ ${}^+$ 02, LTZ04]. They analyzed the runtime of the simple evolutionary multi-objective optimizer (SEMO), a multi-objective counterpart of the randomized local search heuristic, on the CountingOnesCountingZeroes and LeadingOnesTrailingZeroes benchmarks, which are bi-objective analogues of the classic (single-objective) OneMax and LeadingOnes benchmark. Around the same time, Giel [Gie03] analyzed the global SEMO (GSEMO), the multi-objective counterpart of the $(1+1)$ EA, on the LeadingOnesTrailingZeroes function.
Subsequent theoretical works majorly focused on variants of these algorithms and analyzed their runtime on the CountingOnesCountingZeroes and LeadingOnesTrailingZeroes benchmarks, on variants of them, on new benchmarks, and on combinatorial optimization problems [QYZ13, BQT18, RNNF19, QBF20, BFQY20, DZ21]. We note that the (G)SEMO algorithm keeps all non-dominated solutions in the population and discards all others, which can lead to impractically large population sizes. There are three theory works [BFN08, NSN15, DGN16] on the runtime of a simple hypervolume-based MOEA called ( $μ+1$ ) simple indicator-based evolutionary algorithm (( $μ+1$ )-SIBEA), regarding both classic benchmarks and problems designed to highlight particular strengths and weaknesses of this algorithm. As the SEMO and GSEMO, the $(μ+1)$ SIBEA also creates a single offspring per generation; different from the former, it works with a fixed population size $μ$ .
Recently, also decomposition-based multi-objective evolutionary algorithms were analyzed [LZZZ16, HZCH19, HZ20]. These algorithms decompose the multi-objective problem into several related single-objective problems and then solve the single-objective problems in a co-evolutionary manner. This direction is fundamentally different from the above works and our research. Since not primarily focused on multi-objective optimization, we also do not discuss further the successful line of works that solve constrained single-objective problems by turning the constraint violation into a second objective, see, e.g., [NW06, FHH ${}^+$ 10, NRS11, FN15, QYZ15, QSYT17, QYT ${}^+$ 19, Cra19, DDN ${}^+$ 20, Cra21].
Unfortunately, most of the algorithms discussed in these theoretical works are far from the MOEAs used in practice. As pointed out in the survey [ZQL ${}^+$ 11], the majority of the MOEAs used in research and applications builds on the framework of the non-dominated sorting genetic algorithm II (NSGA-II) [DPAM02]. This algorithm works with a population of fixed size $N$ . It uses a complete order defined by the non-dominated sorting and the crowding distance to compare individuals. In each generation, $N$ offspring are generated from the parent population and the $N$ best individuals (according to the complete order) are selected as the new parent population. This approach is thus substantially different from the (G)SEMO algorithm and hypervolume-based approaches (and naturally completely different from decomposition-based methods), see the features of these algorithms described in the above two paragraphs.
Both the predominance in practice and the fundamentally different working principles ask for a rigorous understanding of the NSGA-II. However, to the best of our knowledge so far no mathematical runtime analysis for the NSGA-II has appeared. By mathematical runtime analysis, we mean the question of how many function evaluations a black-box algorithm takes to achieve a certain goal. The computational complexity of the operators used by the NSGA-II, in particular, how to most efficiently implement the non-dominated sorting routine, is a different question (and one that is well-understood [DPAM02]). We note that the runtime analysis in [COGNS20] considers a (G)SEMO algorithm that uses the crowding distance as one of several diversity measures used in the selection of the single parent creating an offspring, but due to the differences of the basic algorithms, none of the arguments used there appear helpful in the analysis of the NSGA-II.
Our Contributions. This paper conducts the first mathematical runtime analysis of the NSGA-II. We regard the NSGA-II with four different parent selection strategies (choosing each individual as a parent once, choosing parents independently and uniformly at random, and two ways of choosing the parents via binary tournaments) and with two classic mutation operators (one-bit mutation and standard bit-wise mutation), but in this first work without crossover (we remark that crossover is very little understood from the runtime perspective in MOEAs, the only works prior to ours we are aware of are [NT10, QYZ13, HZCH19]). As previous theoretical works, we analyze how long the NSGA-II takes to cover the full Pareto front, that is, we estimate the number of iterations until the parent population contains an individual for each objective value of the Pareto front.
When trying to determine the runtime of the NSGA-II, we first note that the selection mechanism of the NSGA-II may remove all individuals with some fixed objective value on the front. In other words, the fact that a certain objective value on the Pareto front was found in some iteration does not mean that this is not lost in some later iteration. This is one of the substantial differences to the (G)SEMO algorithm. We prove that if the population size $N$ is at least four times larger than the size of the Pareto front, then both for the OneMinMax and the LeadingOnesTrailingZeroes benchmarks, such a loss of Pareto front points cannot occur. With this insight, we then show that each of these eight variants of the NSGA-II computes the full Pareto front of the OneMinMax benchmark in an expected number of $O(n\log n)$ iterations (Theorems 2 and 6) and the front of the LeadingOnesTrailingZeroes benchmark in $O(n^2)$ iterations (Theorems 8 and 9). When $N=Θ(n)$ , the corresponding runtime guarantees in terms of fitness evaluations, $O(Nn\log n)=O(n^2\log n)$ and $O(Nn^2)=O(n^3)$ , have the same asymptotic order as those proven previously for the SEMO, GSEMO, and $(μ+1)$ SIBEA (when $μ≥ n+1$ and when $μ=Θ(n)$ for the $(μ+1)$ SIBEA). We note that the benchmarks OneMinMax and LeadingOnesTrailingZeroes are the two most intensively studied benchmarks in the runtime analysis of MOEAs. In this first runtime analysis work on the NSGA-II, we therefore concentrated on these two benchmarks to allow a good comparison with the known performance of other MOEAs.
Using a population size larger than the size of the Pareto front is necessary. We prove that if the population size is equal to the size of the Pareto front, then the NSGA-II (applying one-bit mutation once to each parent) regularly loses points on the Pareto front of OneMinMax. This effect is strong enough so that with high probability for an exponential time each generation of the NSGA-II does not cover a constant fraction of the Pareto front of OneMinMax.
Our short experimental analysis confirms these findings and gives some quantitative estimates for which mathematical analyses are not precise enough. For example, we observe that also with population sizes smaller than what is required for our theoretical analysis (four times the size of the Pareto front), the NSGA-II efficiently covered the Pareto front of the OneMinMax and LeadingOnesTrailingZeroes benchmarks. With suitable population sizes, the NSGA-II beats the GSEMO algorithm on these benchmarks. Complementing our negative result, we observe that the fraction of the Pareto front not covered when using a population size equal to the front size is around 20% for OneMinMax and 40% for LeadingOnesTrailingZeroes. Also without covering the full Pareto front, MOEAs can serve their purpose of proposing to a decision maker a set of interesting solutions. With this perspective, we also regard experimentally the sets of solutions evolved by the NSGA-II when the population size is only equal to the size of the Pareto front. For both benchmarks, we observe that after a moderate runtime, the population contains the two extremal solutions and covers in a very evenly manner the rest of the Pareto front.
Overall, this work shows that the NSGA-II despite its higher complexity (parallel generation of offspring, selection based on non-dominated sorting and crowding distance) admits mathematical runtime analyses in a similar fashion as done before for simpler MOEAs, which hopefully will lead to a deeper understanding of the working principles of this important algorithm.
Subsequent works: We note that the conference version [ZLD22] of this work has already inspired a substantial amount of subsequent research. We briefly describe these results now. In [ZD22a], the performance of the NSGA-II with small population size was analyzed. The main result is that the problem that Pareto front points can be lost can be significantly reduced with a small modification of the selection procedure that was previously analyzed experimentally [KD06], namely to remove individuals in the selection of the next population sequentially, recomputing the crowding distance after each removal. For this setting, an $O(n/N)$ approximation guarantee was proven. In [BQ22], the first runtime analysis of the NSGA-II with crossover was conducted, however, no speed-ups from crossover could be shown. Also, significant speed-ups were shown when using larger tournaments than binary tournaments. In [DQ23a], the performance of the NSGA-II on the multimodal benchmark OneJumpZeroJump benchmark [DZ21] was analyzed. This work shows that the NSGA-II also on this multimodal benchmark has a performance asymptotically at least as good as the GSEMO algorithm (when the population size is at least four times the size of the Pareto front). A matching lower bound for this and our result on OneMinMax was proven in [DQ23b]. This work in particular shows that the NSGA-II in these settings does not profit from population sizes larger than the minimum required population size. Two recent works showed significant performance gains from crossover, one on the OneJumpZeroJump benchmark [DQ23c] and one on an artificial problem [DOSS23b]. The first runtime analysis of the NSGA-II on a combinatorial problem, namely the bi-objective minimum spanning tree problem previously regarded in [Neu07, NW22], was conducted in [CDH ${}^+$ 23]. The first runtime analysis of the NSGA-II for noisy optimization appeared in [DOSS23a]. The first runtime analysis of the SMS-EMOA [BNE07] (a variant of the NSGA-II building on the hyper-volume) was conducted in [BZLQ23]. All these works regard bi-objective problems. For the OneMinMax problem in three or more objectives, it was shown in [ZD22b] that the NSGA-II cannot find the full Pareto front in polynomial time and even has difficulties in approximating it. It was shown in [WD23] that the NSGA-III does not experience these problems, at least in three objectives. With this recent development, we are confident to claim that our first mathematical runtime analysis for the NSGA-II has started a fruitful direction of research.
The remainder of the paper is organized as follows. The NSGA-II framework is briefly introduced in Section 2. Sections 3 and 4 separately show our runtime results of the NSGA-II with large enough population size on the OneMinMax and LeadingOnesTrailingZeroes functions. Section 5 proves the exponential runtime of the NSGA with population size equal to the size of the Pareto front. Our experimental results are discussed in Section 6. Section 7 concludes this work.
## 2 Preliminaries
In this section, we give a brief introduction to multi-objective optimization and to the NSGA-II framework. For the simplicity of presentation, we shall concentrate on two objectives, both of which have to be maximized. A bi-objective objective function on some search space $Ω$ is a pair $f=(f_1,f_2)$ where $f_i:Ω→ℝ$ . We write $f(x)=(f_1(x),f_2(x))$ for all $x∈Ω$ . We shall always assume that we have a bit-string representation, that is, that $S=\{0,1\}^n$ for some $n∈ℕ$ . The challenge in multi-objective optimization is that usually there is no solution that maximizes both $f_1$ and $f_2$ and thus is at least as good as all other solutions.
More precisely, in bi-objective maximization, we say $x$ weakly dominates $y$ , denoted by $x\succeq y$ , if and only if $f_1(x)≥ f_1(y)$ and $f_2(x)≥ f_2(y)$ . We say $x$ strictly dominates $y$ , denoted by $x\succ y$ , if and only if $f_1(x)≥ f_1(y)$ and $f_2(x)≥ f_2(y)$ and at least one of the inequalities is strict. We say that a solution is Pareto-optimal if it is not strictly dominated by any other solution. The set of objective values of all Pareto optima is called the Pareto front of $f$ . With this language, the aim in multi-objective optimization is to compute a small set $P$ of Pareto optima such that $f(P)=\{f(x)\mid x∈ P\}$ is the Pareto front or is at least a diverse subset of it. Consider an algorithm $A$ optimizing a multi-objective problem $f$ with Pareto front $M$ . Let $P_t$ be the population at iteration $t$ and $G_t$ be the number of function evaluations till iteration $t$ , then the time complexity or running time in this paper is the random variable $T_A(f)=∈f\{G_t\mid f(P_t)⊇ M\}$ . Usually, we discuss the expected runtime or the runtime with some probability.
### The NSGA-II
When working with a fixed population size, an MOEA must select the next parent population from the combined parent and offspring population by discarding some of these individuals. For this, a complete order on the combined parent and offspring population could be used so that the next parent population is taken in a greedy manner according to this order. Since dominance is only a partial order, the NSGA-II [DPAM02] extends the dominance relation to the following complete order.
In a given population $P⊆\{0,1\}^n$ , each individual $x$ has both a rank and a crowding distance. The ranks are defined recursively based on the dominance relation. All individuals that are not strictly dominated by another one have rank one. Given that the ranks $1,\dots,k$ are already defined, the individuals of rank $k+1$ are those among the remaining individuals that are not strictly dominated except by individuals of rank $k$ or smaller. This defines a partition of $P$ into sets $F_1,F_2,\dots$ such that $F_i$ contains all individuals with rank $i$ . As shown in [DPAM02], this partition can be computed more efficiently than what the above recursive description suggests, namely in quadratic time, see Algorithm 1 for details. It is clear that individuals with lower rank are more interesting, so when comparing two individuals of different ranks, the one with lower rank is preferred.
To compare individuals in the same rank class $F_i$ , the crowding distance of these individuals (in $F_i$ ) is computed, and the individual with larger distance is preferred. Ties are broken randomly.
1: Input: $S=\{S_1,\dots,S_|S|\}$ : the set of individuals
2: Output: $F_1,F_2,\dots$
3: for $i=1,\dots,|S|$ do
4: $(S_i)=0$ % number of individuals strictly dominating $S_i$
5: $(S_i)=∅$ % set of individuals strictly dominated by $S_i$
6: end for
7: for $i=1,\dots,|S|$ do % compute $$ and $$
8: for $j=1,\dots,|S|$ do
9: if $S_i\prec S_j$ then
10: $(S_i)=(S_i)+1$
11: $(S_j)=(S_j)∪\{S_i\}$
12: end if
13: end for
14: end for
15: $F_1=\{S_i\mid(S_i)=0,i=1,2,\dots,|S|\}$
16: $k=1$
17: while $F_k≠∅$ do
18: $F_k+1=∅$
19: for any $s∈ F_k$ do % discount $F_k$ from $$ and $$
20: for any $s^\prime∈(s)$ do
21: $(s^\prime)=(s^\prime)-1$
22: if $(s^\prime)=0$ then
23: $F_k+1=F_k+1∪\{s^\prime\}$
24: end if
25: end for
26: end for
27: $k=k+1$
28: end while
Algorithm 1 fast-non-dominated-sort(S)
Algorithm 2 shows how the crowding distance in a given set $S$ is computed. The crowding distance of some $x∈ S$ is the sum of the crowding distances $x$ has with respect to each objective function $f_i$ . For a given $f_i$ , the individuals in $S$ are sorted in order of ascending $f_i$ value (for equal values, a tie-breaking mechanism is needed, but we shall not make any assumption on this, that is, our mathematical results are valid regardless of how these ties are broken). The first individual and the last individual in the sorted list have an infinite crowding distance. For other individuals in the sorted list, their crowding distance with respect to $f_i$ is the difference of the objective values of its left and right neighbor in the sorted list, normalized by the difference between the first and the last.
1 Input: $S=\{S_1,\dots,S_|S|\}$ : the set of individuals
Output: $(S)=((S_1),\dots,(S _|S|))$ , the vector of crowding distances of the individuals in $S$
1: $(S)=(0,\dots,0)$
2: for each objective function $f_i$ do
3: Sort $S$ in order of ascending $f_i$ value: $S_i.1,\dots,S_i.{|S|}$
4: $(S_i.1)=+∞,(S_i.{|S|})=+∞$
5: for $j=2,\dots,|S|-1$ do
6: $(S_i.j)=(S_i.j)+\frac{f_i(S_i.{j+ 1})-f_i(S_i.{j-1})}{f_i(S_i.{|S|})-f_i(S_i.1)}$
7: end for
8: end for
Algorithm 2 crowding-distance( $S$ )
The whole NSGA-II framework is shown in Algorithm 3. After the random initialization of the population of size $N$ , the main loop starts with the generation of $N$ offspring (the precise way how this is done is not part of the NSGA-II framework and is mostly left as a design choice to the algorithm user in [DPAM02], although it is suggested to select parents via binary tournaments based the total order described above). Then the total order based on rank and crowding distance is used to remove the worst $N$ individuals in the union of the parent and offspring population. The remaining individuals form the parent population of the next iteration.
1: Uniformly at random generate the initial population $P_0=\{x_1,x_2,\dots,x_N\}$ with $x_i∈\{0,1\}^n,i=1,2,\dots,N.$
2: for $t=0,1,2,\dots$ do
3: Generate the offspring population $Q_t$ with size $N$
4: Use Algorithm 1 to divide $R_t=P_t∪ Q_t$ into $F_1,F_2,\dots$
5: Find $i^*≥ 1$ such that $∑_i=1^i^{*-1}|F_i|<N$ and $∑_i=1^i^{*}|F_i|≥ N$
6: Use Algorithm 2 to separately calculate the crowding distance of each individual in $F_1,\dots,F_i^*$
7: Let $\tilde{F}_i^*$ be the $N-∑_i=0^i^{*-1}|F_i|$ individuals in $F_i^*$ with largest crowding distance, chosen at random in case of a tie
8: $P_t+1=\mathopen{}\mathclose{{}≤ft(\bigcup_i=1^i^{*-1}F_i}\right)∪ \tilde{F}_i^*$
9: end for
Algorithm 3 NSGA-II
## 3 Runtime of the NSGA-II on OneMinMax
In this section, we analyze the runtime of the NSGA-II on the OneMinMax benchmark proposed first by Giel and Lehre [GL10] as a bi-objective analogue of the classic OneMax benchmark. It is the function $f:\{0,1\}^n→ℕ×ℕ$ defined by
$$
f(x)=\big{(}f_1(x),f_2(x)\big{)}=\big{(}n-∑_i=1^nx_i,∑_i=1^
nx_i\big{)}
$$
for all $x=(x_1,\dots,x_n)∈\{0,1\}^n$ . The aim is to maximize both objectives in $f$ . We immediately note that for this benchmark problem, any solution lies on the Pareto front. It is hence a good example to study how an MOEA explores the Pareto front when already some Pareto optima were found.
Giel and Lehre [GL10] showed that the simple SEMO algorithm finds the full Pareto front of OneMinMax in $O(n^2\log n)$ iterations and fitness evaluations. Their proof can easily be extended to the GSEMO algorithm. For the SEMO, a (matching) lower bound of $Ω(n^2\log n)$ was shown in [COGNS20]. An upper bound of $O(μ n\log n)$ was shown for the hypervolume-based $(μ+1)$ SIBEA with $μ≥ n+1$ [NSN15]. When the SEMO or GSEMO is enriched with a diversity mechanism (strong enough so that solutions that can create a new point on the Pareto front are chosen with constant probability), then the runtime of these algorithms reduces to $O(n\log n)$ [COGNS20].
In contrast to the SEMO and GSEMO as well as the $(μ+1)$ SIBEA with population size $μ≥ n+1$ , the NSGA-II can lose all solutions covering a point of the Pareto front. In the following lemma, central to our runtime analyses on OneMinMax, we show that this cannot happen when the population size is large enough, namely at least four times the size of the Pareto front. Besides, we shall use $[i..j],i≤ j$ , to denote the set $\{i,i+1,\dots,j\}$ in this paper.
**Lemma 1**
*Consider one iteration of the NSGA-II with population size $N≥ 4(n+1)$ optimizing the OneMinMax function. Assume that in some iteration $t$ the combined parent and offspring population $R_t=P_t∪ Q_t$ contains a solution $x$ with objective value $(k,n-k)$ for some $k∈[0..n]$ . Then also the next parent population $P_t+1$ contains an individual $y$ with $f(y)=(k,n-k)$ .*
* Proof*
It is not difficult to see that for any $x,y∈\{0,1\}^n$ , we have $x\nprec y$ and $y\nprec x$ . Hence, all individuals in $R_t$ have rank one in the non-dominated sorting of $R_t$ , that is, after Step 4 in Algorithm 3. Thus, in the notation of the algorithm, $F_1=R_t$ and $i^*=1$ . We calculate the crowding distance of each individual of $R_t$ . Let $k∈[0..n].$ Assume that there is at least one individual $x∈ R_t$ such that $f(x)=(k,n-k)$ . We recall from Algorithm 2 that $S_1.1,\dots,S_1.{2N}$ and $S_2.1,\dots,S_2.{2N}$ are the sorted populations based on $f_1$ and $f_2$ respectively. Since the individuals with the same objective value will continuously appear in the sorted list w.r.t. this objective value, we know that there exist $a≤ b$ and $a^\prime≤ b^\prime$ such that $[a..b]=\{i\mid f_1(S_1.i)=k\}$ and $[a^\prime..b^\prime]=\{i\mid f_2(S_2.i)=n-k\}$ . From the crowding distance calculation in Algorithm 2, we know that $(S_1.a)≥\frac{f_1\mathopen{}\mathclose{{}≤ft(S_1. {a+1}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{a-1}}\right)}{f_1 \mathopen{}\mathclose{{}≤ft(S_1.{|S|}}\right)-f_1\mathopen{}\mathclose{{ }≤ft(S_1.1}\right)}≥\frac{f_1\mathopen{}\mathclose{{}≤ft(S_1.{a}} \right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{a-1}}\right)}{f_1\mathopen {}\mathclose{{}≤ft(S_1.{|S|}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_ {1.1}}\right)}>0$ since $f_1\mathopen{}\mathclose{{}≤ft(S_1.{a}}\right)-f_1\mathopen{} \mathclose{{}≤ft(S_a-1}\right)>0$ by the definition of $a$ . Similarly, we have $(S_1.b)>0$ , $(S_2.a^\prime)>0$ and $(S_2.b^\prime)>0$ . For all $j∈[a+1..b-1]$ with $S_1.j∉\{S_2.a^\prime,S_2.b^\prime\}$ , we know $f_1(S_1.{j-1})=f_1(S_1.{j+1})=k$ and $f_2(S_1.{j-1})=f_2(S_1.{j+1})=n-k$ from the definitions of $a,b,a^\prime$ and $b^\prime$ . Hence, we have $(S_1.j)=\frac{f_1(S_1.{j+1})-f_1(S_1.{j-1)}}{f_1 \mathopen{}\mathclose{{}≤ft(S_1.{|S|}}\right)-f_1\mathopen{}\mathclose{ {}≤ft(S_1.1}\right)}+\frac{f_2(S_2.{j^\prime+1})-f_2(S_2.{j^ \prime-1})}{f_2\mathopen{}\mathclose{{}≤ft(S_2.{|S|}}\right)-f_2 \mathopen{}\mathclose{{}≤ft(S_2.1}\right)}=0$ . This shows that the individuals with objective value $(k,n-k)$ and positive crowding distance are exactly $S_1.a$ , $S_1.b$ , $S_2.a^\prime$ and $S_2.b^\prime$ . Hence, for each $(k,n-k)$ , there are at most four solutions $x$ with $f(x)=(k,n-k)$ and $(x)>0$ . Noting that the Pareto front size for OneMinMax is $n+1$ , the number of individuals with positive crowding distance is at most $4(n+1)≤ N$ . Since Step 7 in Algorithm 3 keeps $N$ individuals with largest crowding distance, we know that all individuals with positive crowding distance will be kept. Thus, $y=S_1.a∈ P_t+1$ , proving our claim. ∎
Since Lemma 1 ensures that objective values on the Pareto front will not be lost in the future, we can estimate the runtime of the NSGA-II via the sum of the waiting times for finding a new Pareto solution. Apart from the fact that the NSGA-II generates $N$ solutions per iteration (which requires some non-trivial arguments in the case of binary tournament selection), this analysis resembles the known analysis of the simpler SEMO algorithm [GL10]. For $N=O(n)$ , we also obtain the same runtime estimate (in terms of fitness evaluations).
We start with the easier case that parents are chosen uniformly at random or that each parent creates one offspring.
**Theorem 2**
*Consider optimizing the OneMinMax function via the NSGA-II with one of the following four ways to generate the offspring population in Step 3 in Algorithm 3, namely applying one-bit mutation or standard bit-wise mutation once to each parent or $N$ times choosing a parent uniformly at random and applying one-bit mutation or standard bit-wise mutation to it. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $\frac{2e^2}{e-1}n(\ln n+1)$ iterations and at most $\frac{2e^2}{e-1}Nn(\ln n+1)$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front, then $\Pr[T≥\tfrac{2e^2(1+δ)}{e-1}n\ln n]≤ 2n^-δ$ holds for any $δ≥ 0$ .*
* Proof*
Let $x∈ P_t$ and $f(x)=(k,n-k)$ for some $k∈[0..n]$ . Let $p$ denote the probability that $x$ is chosen as parent to be mutated. Conditional on that, let $p_k^+$ denote the probability of generating from $x$ an offspring $y_+$ with $f(y_+)=(k+1,n-k-1)$ (when $k<n$ ) and $p_k^-$ denote the probability of generating from $x$ an offspring $y_-$ with $f(y_-)=(k-1,n-k+1)$ (when $k>0$ ). Consequently, the probability that $R_t$ contains an individual $y_+$ with objective value $(k+1,n-k-1)$ is at least $pp_k^+$ , and the probability that $R_t$ contains an individual $y_-$ with objective value $(k-1,n-k+1)$ is at least $pp_k^-$ . Since Lemma 1 implies that any existing OneMinMax objective value will be kept in the iterations afterwards, we know that the expected number of iterations to obtain $y_+$ (resp. $y_-$ ) once $x$ is in the population is at most $\frac{1}{pp_k^+}$ (resp. $\frac{1}{pp_k^-}$ ). Assume that the initial population of the Algorithm 3 contains an $x$ with $f(x)=(k_0,n-k_0)$ . Then the expected number of iterations to obtain individuals containing objective values $(k_0,n-k_0),(k_0+1,n-k_0-1),\dots,(n,0)$ is at most $∑_i=k_0^n-1\frac{1}{pp_i^+}$ . Similarly, the expected number of iterations to obtain individuals containing objective values $(k_0-1,n-k_0+1),(k_0-2,n-k_0-2),\dots,(0,n)$ is at most $∑_i=1^k_0\frac{1}{pp_i^-}$ . Consequently, the expected number of iterations to cover the whole Pareto front is at most $∑_i=k_0^n-1\frac{1}{pp_i^+}+∑_i=1^k_0\frac{1}{pp_i^-}$ . Now we calculate $p$ for the different ways of selecting parents and $p_k^+$ and $p_k^-$ for the different mutation operations. If we apply mutation once to each parent in $P_t$ , we apparently have $p=1$ . If we choose the parents independently at random from $P_t$ , then $p=1-(1-\frac{1}{N})^N≥ 1-\frac{1}{e}$ . For one-bit mutation, we have $p_k^+=\frac{n-k}{n}$ and $p_k^-=\frac{k}{n}$ . For standard bit-wise mutation, we have $p_k^+≥\frac{n-k}{n}(1-\frac{1}{n})^n-1≥\frac{n-k}{en}$ and $p_k^-≥\frac{k}{n}(1-\frac{1}{n})^n-1≥\frac{k}{en}$ . From these estimates and the fact that the Harmonic number $H_n=∑_i=1^n\frac{1}{i}$ satisfies $H_n<\ln n+1$ , it is not difficult to see that all cases lead to an expected runtime of at most
| | $\displaystyle∑_i=0^n-1$ | $\displaystyle{}\frac{1}{pp_i^+}+∑_i=1^n\frac{1}{pp_i^-}≤∑ _i=0^n-1\frac{1}{(1-\frac{1}{e})\frac{n-i}{en}}+∑_i=1^n\frac{1}{(1- \frac{1}{e})\frac{i}{en}}$ | |
| --- | --- | --- | --- |
iterations, hence at most $\frac{2e^2}{e-1}Nn(\ln n+1)$ fitness evaluations. Now we will prove the concentration result. Let $X^+_k$ and $X^-_k$ be independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{n-k}{en}$ and $(1-\frac{1}{e})\frac{k}{en}$ , respectively. Let $T$ be the number of iterations to cover the full Pareto front, and let $Z^+=∑_k=0^n-1X^+_k$ and $Z^-=∑_k=1^nX^-_k$ . Then from the above discussion, we know that $Z:=Z^++Z^-$ stochastically dominates $T$ (see [Doe19] for a detailed discussion of how to use stochastic domination arguments in the analysis of evolutionary algorithms). Let the success probabilities of $X^+_n-1,X^+_n-2,\dots,X^+_0$ be $q_1^+,\dots,q_n^+$ , and let $q_1^-,\dots,q_n^-$ denote the success probabilities of $X^-_1,X^-_2,\dots,X^-_n$ . Then we have $q_i^+≥(1-\frac{1}{e})\frac{1}{e}\frac{i}{n}$ and $q_i^-≥(1-\frac{1}{e})\frac{1}{e}\frac{i}{n}$ for all $i∈[1..n]$ . From a Chernoff bound for a sum of such geometric random variables ([DD18, Lemma 4], also found in [Doe20, Theorem 1.10.35]), we have that for any $δ≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}≤ft[Z^+≥(1+δ)\frac{e^2}{e-1}n\ln n}
\right]≤ n^-δ
$$
and
$$
\Pr\mathopen{}\mathclose{{}≤ft[Z^-≥(1+δ)\frac{e^2}{e-1}n\ln n}
\right]≤ n^-δ.
$$
Hence, we have
$$
\Pr\mathopen{}\mathclose{{}≤ft[Z≥(1+δ)\frac{2e^2}{e-1}n\ln n}
\right]≤ 2n^-δ.
$$
Since $Z$ stochastically dominates $T$ , we obtain $\Pr[T≥\tfrac{2e^2(1+δ)}{e-1}n\ln n]≤ 2n^-δ.$ ∎
We now analyze the performance of the NSGA-II on OneMinMax when selecting the parents via binary tournaments, which is the selection method suggested in the original NSGA-II paper [DPAM02]. We regard two variants of this selection method. The most natural one, discussed for example in [GD90], is to conduct $N$ independent tournaments. Here the offspring population $Q_t$ is generated by $N$ times independently performing the following sequence of actions: (i) Select two different individuals $x^\prime,x^\prime\prime$ uniformly at random from $P_t$ . (ii) Select $x$ as the better of these two, that is, the one with smaller rank in $P_t$ or, in case of equality, the one with larger crowding distance in $P_t$ (breaking ties randomly). (iii) Generate an offspring by mutating $x$ . We note that in some definitions of tournament selection the better individual in a tournament is chosen as winner only with some probability $p>0.5$ , but we do not regard this case any further. We note though that all our mathematical results would remain true in this setting. We also note that sometimes the participants of a tournament are selected “with replacement”. Again, this would not change our results, but we do not discuss this case any further.
A closer look in Deb’s implementation of the NSGA-II (see Revision 1.1.6 available at [Deb]), and we are thankful for Maxim Buzdalov (Aberystwyth University) for pointing this out to us, shows that here a different way of selecting the parents is used. In this two-permutation tournament selection scheme, two random permutations $π_1$ and $π_2$ of $P_t$ are generated and then a binary tournament is conducted between $π_j(2i-1)$ and $π_j(2i)$ for all $i∈[1..\frac{N}{2}]$ and $j∈\{1,2\}$ (we assume here that $N$ is even). Of course, this is nothing else than saying that twice a random matching on $P_t$ is generated and the end vertices of each matching edge conduct a tournament. Different from independent tournaments, this selection operator cannot be implemented in parallel. On the positive side, it ensures that each individual takes part in exactly two tournaments, so it treats the individuals in a fairer manner. Also, if there is a unique best individual, then this will surely be selected. As above, in our setting where we do not use crossover, each tournament winner is mutated and these $N$ individuals form the offspring population $Q_t$ .
In the case of binary tournament selection, the analysis is slightly more involved since we need to argue that a desired parent is chosen for mutation with constant probability in one iteration. This is easy to see for a parent at the boundary of the front as its crowding distance is infinite, but less obvious for parents not at the boundary. We note that we need to be able to select such parents since we cannot ensure that the population intersects the Pareto front in a contiguous interval (as can be seen, e.g., from the random initial population). We solve this difficulty in the following three lemmas.
We use the following notation. Consider some iteration $t$ . For $i=1,2$ , let
| | $\displaystyle v_i^\min$ | $\displaystyle=\min\{f_i(x)\mid x∈ R_t\},$ | |
| --- | --- | --- | --- |
denote the extremal objective values in the combined parent and offspring population. Let $V=f(R_t)=\{(f_1(x),f_2(x))\mid x∈ R_t\}$ denote the set of objective values of the solutions in the combined parent and offspring population $R_t$ . We define the set of values such that also the right (left) neighbor on the Pareto front is covered by
| | $\displaystyle V_^+={}$ | $\displaystyle\{(v_1,v_2)∈ V\mid∃ y∈ R_t:(f_1(y),f_2(y))=(v _1+1,v_2-1)\},$ | |
| --- | --- | --- | --- |
**Lemma 3**
*For any $(v_1,v_2)∈ V∖(V_^+∩ V_^-)$ , there is at least one individual $x∈ R_t$ with $f(x)=(v_1,v_2)$ and $(x)≥\frac{2}{v_1^\max-v_1^\min}$ .*
* Proof*
Let $(v_1,v_2)∈ V∖(V_^+∩ V_^-)$ , let $S_1.1,\dots,S_1.{2N}$ be the sorting of $R_t$ according to $f_1$ , and let $[a..b]=\{i∈[1..2N]\mid f_1(S_1.i)=v_1\}$ . If $v_1∈\{v_1^\max,v_1^\min\}$ , then by definition of the crowding distance, one individual in $f^-1((v_1,v_2))$ has an infinite crowding distance. Otherwise, if $(v_1,v_2)∈ V∖ V_^-$ , then we have $f_1\mathopen{}\mathclose{{}≤ft(S_1.{a+1}}\right)-f_1\mathopen{} \mathclose{{}≤ft(S_1.{a-1}}\right)≥ f_1\mathopen{}\mathclose{{}≤ft( S_1.{a}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{a-1}}\right)≥ 2$ and thus $(S_1.a)≥\frac{f_1\mathopen{}\mathclose{{}≤ft(S_1. {a+1}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{a-1}}\right)}{v_1^ \max-v_1^\min}≥\frac{2}{v_1^\max-v_1^\min}$ . Similarly, if $(v_1,v_2)∈ V∖ V_^+$ , then $f_1\mathopen{}\mathclose{{}≤ft(S_1.{b+1}}\right)-f_1\mathopen{} \mathclose{{}≤ft(S_1.{b-1}}\right)≥ f_1\mathopen{}\mathclose{{}≤ft( S_1.{b+1}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{b}}\right)≥ 2$ and $(S_1.b)≥\frac{f_1\mathopen{}\mathclose{{}≤ft(S_1. {b+1}}\right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{b-1}}\right)}{v_1^ \max-v_1^\min}≥\frac{2}{v_1^\max-v_1^\min}$ . ∎
**Lemma 4**
*For any $(v_1,v_2)∈ V_^+∩ V_^-$ , there are at most two individuals in $R_t$ with objective value $(v_1,v_2)$ and crowding distance at least $\frac{2}{v_1^\max-v_1^\min}$ .*
* Proof*
Let $(v_1,v_2)∈ V_^+∩ V_^-$ , $[a..b]=\{i∈[1..2N]\mid f_1(S_1.i)=v_1\}$ , and $[a^\prime..b^\prime]=\{j∈[1..2N]\mid f_2(S_2.j)=v_2\}$ . Let $C=\{S_1.a,S_1.b\}∪\{S_2.a^\prime,S_2.b^\prime\}$ . If $(R_t∩ f^-1((v_1,v_2)))∖ C$ is not empty, then for any $x∈(R_t∩ f^-1((v_1,v_2)))∖ C$ , there exist $i∈[a+1..b-1]$ and $j∈[a^\prime+1..b^\prime-1]$ such that $x=S_1.i=S_2.j$ . Hence $(x)=\frac{f_1\mathopen{}\mathclose{{}≤ft(S_1.{i+1}} \right)-f_1\mathopen{}\mathclose{{}≤ft(S_1.{i-1}}\right)}{v_1^\max-v _1^\min}+\frac{f_2\mathopen{}\mathclose{{}≤ft(S_2.{j+1}}\right)-f_2 \mathopen{}\mathclose{{}≤ft(S_2.{j-1}}\right)}{v_2^\max-v_2^\min}=0$ . We thus know that any individual with crowding distance at least $\frac{2}{v_1^\max-v_1^\min}$ lies in $C$ . For any $x∈ C∖(\{S_1.a,S_1.b\}∩\{S_2.a^\prime,S_2.b^\prime\})$ , we have $(x)=\frac{1}{v_1^\max-v_1^\min}$ or $(x)=\frac{1}{v_2^\max-v_2^\min}$ . We note that $v_1^\max-v_1^\min=v_2^\max-v_2^\min$ since $v_1^\max=n-v_2^\min$ and $v_1^\min=n-v_2^\max$ . Hence $(x)<\frac{2}{v_1^\max-v_1^\min}$ . Let now $x∈\{S_1.a,S_1.b\}∩\{S_2.a^\prime,S_2.b^\prime\}$ . If $|C|=1$ , then $(x)=\frac{2}{v_1^\max-v_1^\min}+\frac{2}{v_2^ \max-v_2^\min}=\frac{4}{v_1^\max-v_1^\min}$ . Otherwise, $(x)=\frac{1}{v_1^\max-v_1^\min}+\frac{1}{v_2^ \max-v_2^\min}=\frac{2}{v_1^\max-v_1^\min}$ . Therefore, the number of individuals in $R_t∩ f^-1((v_1,v_2))$ with crowding distance at least $\frac{2}{v_1^\max-v_1^\min}$ is $|\{S_1.a,S_1.b\}∩\{S_2.a^\prime,S_2.b^\prime\}|$ , which is at most $2$ . ∎
**Lemma 5**
*Let $N≥ 4(n+1)$ . Let $P$ be a parent population in a run of the NSGA-II using independent or two-permutation binary tournament selection optimizing OneMinMax. Let $v=(v_1,n-v_1)∉ f(P)$ be a point on the Pareto front that is not covered by $P$ , but a neighbor of $v$ on the front is covered by $P$ , that is, there is a $y∈ P$ such that $\|f(y)-v\|_∞=1$ . In the case of independent tournaments, each of the $N$ tournaments with probability at least $\frac{1}{N}(\frac{1}{6}-\frac{3.5}{N-1})$ selects an individual $x$ with $f(x)=f(y)$ . In the case of two-permutation selection, there are two stochastically independent tournaments each of which with probability at least $\frac{1}{6}-\frac{2.5}{N-1}$ selects an individual $x$ with $f(x)=f(y)$ .*
* Proof*
By Lemma 3, there is an individual $x^\prime∈ P$ with $f(x^\prime)=f(y)$ and crowding distance at least $\frac{2}{v_1^\max-v_1^\min}$ . We estimate the probability that $x^\prime$ is the winner of a tournament. We start with the case of independent tournaments and we regard a particular one of these. With probability $\frac{1}{N}$ , the individual $x^\prime$ is chosen as the first participant of the tournament. We condition on this and regard the second individual $x^\prime\prime$ of the tournament, which is chosen uniformly at random from the remaining $N-1$ individuals. We shall argue that with good probability, it has a smaller crowding distance, and thus loses the tournament. To this aim, we estimate the number of element $z∈ P$ that have a crowding distance of $\frac{2}{v_1^\max-v_1^\min}$ or more (“large crowding distance”). We treat the individuals differently according to their objective value $w=f(z)$ . If $w∈ V_^+∩ V_^-$ , then by Lemma 4 at most two individuals with this objective value have a large crowding distance. For other objective values $w$ , we use the general estimate from the proof of Lemma 1 that at most four individuals have this objective value and positive crowding distance. This gives an upper bound of $2|V_^+∩ V_^-|+4(|f(P)|-|V_ ^+∩ V_^-|)=2|f(P)|+2|f(P)∖ (V_^+∩ V_^-)|$ individuals with large crowding distance. We note that out of each consecutive three elements $(w_1,n-w_1),(w_1+1,n-w_1-1),(w_1+2,n-w_1-2)$ of the Pareto front, at most two can be in $f(P)∖(V_^+∩ V_^-)$ – if all three were in $f(P)$ , then the middle one would necessarily be in $V_^+∩ V_^-$ . Consequently, $|f(P)∖(V_^+∩ V_^-)|≤ 2 \lceil\frac{n+1}{3}\rceil$ . With this estimate, our upper bound on the number of individuals with large crowding distance becomes at most $2(n+1)+4\lceil\frac{n+1}{3}\rceil≤\frac{10}{3}(n+1)+\frac{8}{3}$ , and then excluding the first-chosen individual $x^\prime$ , we know that the upper bound estimate for the probability that $x^\prime\prime$ has large crowding distance becomes $\frac{1}{N-1}(\frac{10}{3}(n+1)+\frac{8}{3}-1)=\frac{1}{N-1}(\frac{10}{3}(n+ \frac{3}{4})+\frac{10}{3}\frac{1}{4}+\frac{5}{3})≤\frac{5}{6}+\frac{1}{N-1} (\frac{10}{3}\frac{1}{4}+\frac{5}{3})$ . Consequently, the probability that $x^\prime$ is selected as first participant of the tournament and it wins the tournament is at least
$$
\frac{1}{N}\mathopen{}\mathclose{{}≤ft(\frac{1}{6}-\frac{2.5}{N-1}}\right).
$$
For the case of two-permutation tournament selection, we note that there are two independent tournaments (stemming from different permutations) in which $x^\prime$ participates. In both, the partner $x^\prime\prime$ of $x^\prime$ is distributed uniformly in $P_t∖\{x^\prime\}$ . Hence the above arguments can be applied and we see that with probability at least $\frac{1}{6}-\frac{2.5}{N-1}$ , the second participant loses against $x^\prime$ . ∎
With Lemma 5, we can now easily argue that in a given iteration $t$ , we have a constant probability of choosing at least once a parent that is a neighbor of an empty spot on the Pareto front. This allows to re-use the main arguments of the simpler analyses for the cases that the parents were choosing randomly or that each parent creates one offspring. We note that in the following result, as in any other result in this work, we did not try to optimize the leading constant.
**Theorem 6**
*Let $n≥ 4$ . Consider optimizing the OneMinMax function via the NSGA-II which creates the offspring population by selecting parents via independent binary tournaments or via the two-permutation approach and applying one-bit or standard bit-wise mutation to these. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $\tfrac{200e}{3}n(\ln n+1)$ iterations and at most $\tfrac{200e}{3}Nn(\ln n+1)$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front. Then we further have that $\Pr[T≥\tfrac{200e}{3}(1+δ)n\ln n]≤ 2n^-δ$ holds for any $δ≥ 0$ .*
* Proof*
Thanks to Lemma 5, we can essentially follow the arguments of the proof of Theorem 2. Let $y∈ P_t$ be such that $f(y)$ is a neighbor of a point on the Pareto front that is not in $f(P_t)$ . For independent tournaments, by Lemma 5 a single tournament will select a parent $x$ with $f(x)=f(y)$ , that is, also a neighbor of this uncovered point, with probability at least $\frac{1}{N}(\frac{1}{6}-\frac{2.5}{N-1})$ . Hence the probability that at least one such parent is selected in this iteration is
| | $\displaystyle p=1-\mathopen{}\mathclose{{}≤ft(1-\tfrac{1}{N}(\tfrac{1}{6}- \tfrac{2.5}{N-1})}\right)^N≥ 1-\exp\mathopen{}\mathclose{{}≤ft(-\tfrac{ 1}{6}+\tfrac{2.5}{N-1}}\right)>0.03,$ | |
| --- | --- | --- |
where the last inequality uses $N≥ 20$ from $n≥ 4$ and $N≥ 4(n+1)$ . For two-permutation tournament selection, again by Lemma 5, with probability at least $p=1-(1-(\frac{1}{6}-\frac{2.5}{N-1}))^2>0.03$ (since $N≥ 20$ ) a parent $x$ with $f(x)=f(y)$ is selected. With these values of $p$ , the proof of Theorem 2 extends to the two cases of tournament selection, and we know that the expected iterations to cover the full Pareto front is at most
| | $\displaystyle∑_i=0^n-1$ | $\displaystyle{}\frac{1}{pp_i^+}+∑_i=1^n\frac{1}{pp_i^-}≤∑ _i=0^n-1\frac{1}{0.03\frac{n-i}{en}}+∑_i=1^n\frac{1}{0.03\frac{i}{ en}}$ | |
| --- | --- | --- | --- |
We now discuss the concentration result. With the same arguments as in the proof of Theorem 2, but using the success probabilities $0.03\frac{n-k}{en}$ and $0.03\frac{k}{en}$ for $X^+_k$ and $X^-_k$ respectively and estimating $q_i≥\frac{0.03}{e}\frac{i}{n}$ , we obtain that for any $δ≥ 0$ , we have $\Pr[T≥\tfrac{200e}{3}(1+δ)n\ln n]≤ 2n^-δ.$ ∎
## 4 Runtime of the NSGA-II on LeadingOnesTrailingZeroes
We proceed with analyzing the runtime of the NSGA-II on the benchmark LeadingOnesTrailingZeroes proposed by Laumanns, Thiele, and Zitzler [LTZ04]. This is the function $f:\{0,1\}^n→ℕ×ℕ$ defined by
$$
f(x)=\big{(}f_1(x),f_2(x)\big{)}=\big{(}∑_i=1^n∏_j=1^ix_j
,∑_i=1^n∏_j=i^n(1-x_j)\big{)}
$$
for all $x∈\{0,1\}^n$ . Here the first objective is the so-called LeadingOnes function, counting the number of (contiguous) leading ones of the bit string, and the second objective counts in an analogous fashion the number of trailing zeros. Again, the aim is to maximize both objectives. Different from OneMinMax, here many solutions exist that are not Pareto optimal. The known runtimes for this benchmark are $Θ(n^3)$ for the SEMO [LTZ04], $O(n^3)$ for the GSEMO [Gie03], and $O(μ n^2)$ for the $(μ+1)$ SIBEA with population size $μ≥ n+1$ [BFN08].
Similar to OneMinMax, we can show that when the population size is large enough, an objective value on the Pareto front stays in the population from the point on when it is discovered.
**Lemma 7**
*Consider one iteration of the NSGA-II with population size $N≥ 4(n+1)$ optimizing the LeadingOnesTrailingZeroes function. Assume that in some iteration $t$ the combined parent and offspring population $R_t=P_t∪ Q_t$ contains a solution $x$ with rank one. Then also the next parent population $P_t+1$ contains an individual $y$ with $f(y)=f(x)$ . In particular, once the parent population contains an individual with objective value $(k,n-k)$ , it will do so for all future generations.*
* Proof*
Let $F_1$ be the set of solutions of rank one, that is, the set of solutions in $R_t$ that are not dominated by any other individual in $R_t$ . By definition of dominance, for each $v_1∈\{f_1(x)\mid x∈ F_1\}$ , there exists a unique $v_2$ such that $(v_1,v_2)∈\{f(x)\mid x∈ F_1\}$ . Therefore, $|f(F_1)|$ is at most $n+1$ . We now reuse the argument from the proof of Lemma 1 for OneMinMax that for each objective value, there are at most $4$ individuals with this objective value and positive crowding distance. Thus the number of individuals in $F_1$ with positive crowding distance is at most $4(n+1)≤ N$ . Since the NSGA-II keeps $N$ individuals with smallest rank and largest crowding distance in case of a tie, we know that the individuals with rank one and positive crowding distance will all be kept. This shows the first claim. For the second claim, let $x∈ P_t$ with $f(x)=(k,n-k)$ for some $k$ . Since $x$ lies on the Pareto front of LeadingOnesTrailingZeroes, the rank of $x$ in $R_t$ is necessarily one. Hence by the first claim, a $y$ with $f(y)=f(x)$ will be contained in $P_t+1$ . A simple induction extends this finding to all future generations. ∎
Since not all individuals are on the Pareto front, the runtime analysis for LeadingOnesTrailingZeroes function is slightly more complex than for OneMinMax. We analyze the process in two stages: the first stage lasts until we have found both extremal solutions of the Pareto front. In this phase, we argue that the first (resp. second) objective value increases by one every (expected) $O(n)$ iterations. Consequently, after an expected number of $O(n^2)$ iterations, we have an individual $x$ in the population with $f_1(x)=n$ (resp. $f_2(x)=n$ ), which are the desired extremal individuals. The second stage, where we complete the Pareto front from existing Pareto solutions, can be analyzed in a similar manner as for OneMinMax in Theorem 2, noting of course the different probabilities to generate a new solution on the Pareto front. We start with the two easier parent selections and discuss tournament selection separately in Theorem 9.
**Theorem 8**
*Consider optimizing the LeadingOnesTrailingZeroes function via the NSGA-II with one of the following four ways to generate the offspring population in Step 3 in Algorithm 3, namely applying one-bit mutation or standard bit-wise mutation once to each parent or $N$ times choosing a parent uniformly at random and applying one-bit mutation or standard bit-wise mutation to it. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is $\frac{2e^2}{e-1}n^2$ iterations and $\frac{2e^2}{e-1}Nn^2$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front. Then
| | $\displaystyle\Pr\mathopen{}\mathclose{{}≤ft[T≥\frac{2e^2(1+δ)}{e-1 }n^2}\right]≤\exp\mathopen{}\mathclose{{}≤ft(-\frac{δ^2}{2(1+ δ)}(2n-1)}\right)$ | |
| --- | --- | --- |
holds for any $δ≥ 0$ .*
* Proof*
Consider one iteration $t$ of the first stage, that is, we have $P_t^p=\{x∈ P_t\mid f_1(x)+f_2(x)=n\}=∅$ . Let $v_t=\max\{f_1(x)\mid x∈ P_t\}$ and let $P_t^*=\{x∈ P_t\mid f_1(x)=v_t\}$ . Note that by Lemma 7, $v_t$ is non-decreasing over time. Let $x∈ P_t^*$ . Let $p$ denote the probability that $x$ is chosen as a parent to be mutated (note that this probability is independent of $x$ for the two selection schemes regarded here). Conditional on that, let $p^*$ be a lower bound (independent of $x$ ) on the probability that $x$ generates a solution with a larger $f_1$ -value. Then the expected number of iterations to obtain a solution with better $f_1$ -value is at most $\frac{1}{pp^*}$ . Consequently, the expected number of iterations to obtain a $f_1$ -value of $n$ , thus a solution on the Pareto front, is at most $(n-k_0)\frac{1}{pp^*}≤ n\frac{1}{pp^*}$ , where $k_0$ is the maximum LeadingOnes value in the initial population. For the second stage, let $x∈ P_t^p$ be such that a neighbor of $f(x)$ on the front is not yet covered by $P_t$ . Let $p^\prime$ denote the probability that $x$ is chosen as a parent to be mutated. Conditional on that, let $p^**$ denote a lower bound (independent of $x$ ) for the probability to generate a particular neighbor of $x$ on the front. Consequently, the probability that $R_t$ covers an extra element of the Pareto front is at least $p^\primep^**$ . Since Lemma 7 implies that any existing LeadingOnesTrailingZeroes value on the front will be kept in the following iterations, we know that the expected number of iterations for this progress is at most $\frac{1}{p^\primep^**}$ . Since $n$ such progresses are sufficient to cover the full Pareto front, the expected number of iterations to cover the whole Pareto front is at most $n\frac{1}{p^\primep^**}$ . Therefore, the expected total runtime is at most $n\frac{1}{pp^*}+n\frac{1}{p^\primep^**}$ iterations. We recall from Theorem 2 that we have $p=p^\prime=1$ when selecting each parent once and we have $p=p^\prime=1-(1-\frac{1}{N})^N≥ 1-\frac{1}{e}$ when choosing parents randomly. To estimate $p^*$ and $p^**$ , we note that the desired progress can always be obtained by flipping one particular bit. Hence for one-bit mutation, we have $p^*=p^**=\frac{1}{n}$ . For standard bit-wise mutation, $\frac{1}{n}(1-\frac{1}{n})^n-1≥\frac{1}{en}$ is a valid lower bound for $p^*$ and $p^**$ . With these estimates, we obtain in all cases an expected runtime of at most
| | $\displaystyle n\frac{1}{pp^*}+n\frac{1}{p^\primep^**}≤ n\frac{1}{(1- \frac{1}{e})\frac{1}{en}}+n\frac{1}{(1-\frac{1}{e})\frac{1}{en}}=\frac{2e^2n ^2}{e-1}$ | |
| --- | --- | --- |
iterations, hence $\frac{2e^2}{e-1}Nn^2$ fitness evaluations. Now we will prove the concentration result. The time to cover the full Pareto front is divided into two stages as discussed before. It is not difficult to see that the first stage is to reach a Pareto optimum for the first time, and the corresponding runtime is dominated by the sum of $n$ independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{1}{en}$ . The second stage is to cover the full Pareto front, and the corresponding runtime is dominated by the sum of another $n$ such independent geometric random variables. Formally, let $X_1,\dots,X_2n$ be independent geometric random variables with success probabilities of $(1-\frac{1}{e})\frac{1}{en}$ , and let $T$ be the number of iterations to cover the full Pareto front. Then $Z:=∑_i=1^2nX_i$ stochastically dominates $T$ , and $E[Z]=\frac{2e^2n^2}{e-1}$ . From a Chernoff bound for sums of independent identically distributed geometric random variables [Doe20, (1.10.46) in Theorem 1.10.32], we have that for any $δ≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}≤ft[Z≥(1+δ)\frac{2e^2n^2}{e-1}}\right
]≤\exp\mathopen{}\mathclose{{}≤ft(-\frac{δ^2}{2}\frac{2n-1}{1+
δ}}\right).
$$
Since $Z$ dominates $T$ , we have proven this theorem. ∎
We now study the runtime of the NSGA-II using binary tournament selection. Compared to OneMinMax, we face the additional difficulty that now rank one solutions can exist which are not on the Pareto front. Due to their low rank, they could perform well in the selection, but being possibly far from the front, they are not interesting as parents. We need a sightly different general proof outline to nevertheless argue that sufficiently often a parent on the Pareto front generates a new neighbor on the front. Also, since not all individuals are on the Pareto front, we do not have anymore the property that the difference between the maximum and minimum value is the same for both objectives. We overcome this by first showing the NSGA-II finds the two extremal points of the Pareto front in reasonable time (then the maximum values are both $n$ and the minimum values are both $0 0$ ).
**Theorem 9**
*Consider optimizing the LeadingOnesTrailingZeroes function via the NSGA-II. Assume that the parents for variation are chosen either via $N$ independent random tournaments between different individuals or via the two-permutation implementation of binary tournaments. Assume that these parents are mutated via one-bit or standard bit-wise mutation. If the population size $N$ is at least $4(n+1)$ , then the expected runtime is at most $15en^2$ iterations and at most $15eNn^2$ fitness evaluations. Besides, let $T$ be the number of iterations to reach the full Pareto front, then
| | $\displaystyle\Pr\mathopen{}\mathclose{{}≤ft[T≥\frac{(1+δ)100e}{3}n^ 2}\right]≤\exp\mathopen{}\mathclose{{}≤ft(-\frac{δ^2}{2(1+δ) }(3n-1)}\right)$ | |
| --- | --- | --- |
holds for any $δ≥ 0$ .*
* Proof*
We first argue that, regardless of the initial state of the NSGA-II, it takes $O(n^2)$ iterations until the extremal point $(1,\dots,1)$ , which is the unique maximum of $f_1$ , is in $P_t$ . To this aim, let $X_t:=\max\{f_1(x)\mid x∈ P_t\}$ denote the maximum $f_1$ value in the parent population. We note that any $x∈ P_t$ with $f_1(x)=X_t$ lies on the first front $F_1$ and that there is a $y∈ P_t$ with infinite crowding distance and $f(y)=f(x)$ , in particular, $f_1(y)=X_t$ . If parents are chosen via independent tournaments, such a $y$ has a $\frac{2}{N}$ chance of being one of the two individuals of a fixed tournament. It then wins the tournament with at least 50% chance (where the 50% show up only in the rare case that the other individual also lies on the first front and has an infinite crowding distance). Hence the probability that this $y$ is chosen at least once as a parent to be mutated is at least $p=1-(1-\frac{1}{2}\frac{2}{N})^N≥ 1-\frac{1}{e}$ . When the two-permutation implementation of tournament selection is used, then $y$ appears in both permutations and has a random partner in both. Again, this partner with probability at most $\frac{1}{2}$ wins the tournament. Hence the probability that $y$ is selected as a parent at least once is at least $p=1-(\frac{1}{2})^2=\frac{3}{4}$ . Conditional on $y$ being chosen at least once, let us regard a fixed mutation step in which $y$ was selected as a parent. To mutate $y$ into an individual with higher $f_1$ value, it suffices to flip a particular single bit (namely the first zero after the initial contiguous segment of ones). The probability for this is $p^*=\frac{1}{n}$ for one-bit mutation and $p^*=\frac{1}{n}(1-\frac{1}{n})^n-1≥\frac{1}{en}$ for standard bit-wise mutation. Denoting by $Y_t:=\max\{f_1(x)\mid x∈ R_t\}$ the maximum $f_1$ value in the combined parent and offspring population, we have just shown that $\Pr[Y_t≥ X_t+1]≥ pp^*=Ω(1/n)$ whenever $X_t<n$ . We note that any $x∈ R_t$ with $f_1(x)=Y_t$ lies on the first front $F_1$ of $R_t$ and that there is a $y∈ R_t$ with infinite crowding distance and $f(y)=f(x)$ , in particular, $f_1(y)=Y_t$ . Consequently, such a $y$ will be kept in the next parent population $P_t+1$ (note that there are at most $4$ individuals in $F_1$ with infinite crowding distance – since $N≥ 4$ , they will all be included in $P_t+1$ ). This shows that we also have $\Pr[X_t+1≥ X_t+1]≥ pp^*$ whenever $X_t<n$ . By adding the expected waiting times for an increase of the $X_t$ value, we see that the expected time to have $X_t=n$ , that is, to have $(1,\dots,1)∈ P_t$ , is at most
| | $\displaystyle\frac{n}{pp^*}≤\frac{n}{(1-\frac{1}{e})\frac{1}{en}}=\frac{e ^2n^2}{e-1}$ | |
| --- | --- | --- |
iterations. By a symmetric argument, we see that after another at most $\frac{e^2}{e-1}n^2$ iterations, also the other extremal point $(0,\dots,0)$ is in the population (and remains there forever by Lemma 7). With now both extremal points of the Pareto front covered, we analyze the remaining time until the Pareto front is fully covered. We note that by Lemma 7, the number of Pareto front points covered cannot decrease. Hence it suffices to prove a lower bound for the probability that the coverage increases in one iteration. This is what we do now. Assume that the Pareto front is not yet fully covered. Since we have some Pareto optimal individuals, there also is a Pareto optimal individual $x∈ P_t$ such that $f(x)$ is a neighbor of a point $v$ on the Pareto front that is not covered. Since we have both extremal points in the Pareto front, the differences between the maximum and minimum value are the same for both objectives (namely $n$ ). Consequently, in the same way as in the proof of Lemma 3, we know that there is also such a $y$ with $f(y)=f(x)$ and with crowding distance at least $\frac{2}{n}$ . We estimate the number of individuals in $P_t∖\{y\}$ which could win a tournament against this $y$ . Clearly, these can only be individuals in the first front $F_1$ of the non-dominated sorting. Assume first that $|f(F_1)|≤ 0.8(n+1)$ . We note that, just by the definition of crowding distance and in a similar fashion as in the proof of Lemma 1, for each $v∈ f(F_1)$ there are at most four individuals with $f$ value equal to $v$ and positive crowding distance. All other individuals in $F_1$ have a crowding distance of zero (and thus lose the tournament against $y$ ), as do all individuals not in $F_1$ . Consequently, there are at least $N_0=N-4· 0.8(n+1)$ individuals other than $y$ that would lose a tournament against $y$ . Assume now that $m:=|f(F_1)|>0.8(n+1)$ . Since $F_1$ consists of pair-wise incomparable solutions or solutions with identical objective value (which we may ignore for the following argument), we have $|f_1(F_1)|=|f_2(F_1)|=m$ . For any $v=(v_1,v_2)$ , let $v^+:=(v_1+1,v_2-1)$ and $v^-:=(v_1-1,v_2+1)$ . Then we divide $f(F_1)$ into two disjoint sets $U_1=\{v∈ f(F_1)∩[1..n-1]^2\mid v^+∉ f(F_1) or v^- ∉ f(F_1)\}$ and $U_2=f(F_1)∖ U_1$ . Since both $f_1(F_1)$ and $f_2(F_1)$ are subsets of $[0..n]$ , which has $n+1$ elements, we see that less than $0.2(n+1)$ of the values in $[0..n]$ are missing in $f_1(F_1)$ , and analogously in $f_2(F_1)$ . Since each value missing in $f_1(F_1)$ or $f_2(F_1)$ leads to at most two values in $U_1$ , we have $|U_1|<4· 0.2(n+1)=0.8(n+1)$ . For the values in $U_1$ , we use the blunt estimate from above that at most $4$ individuals with this objective value and positive crowding distance exist. For the values $v∈ U_2$ , we are in the same situation as in Lemma 4, and thus there are at most two individuals $x∈ F_1$ with $f(x)=v$ and crowding distance at least $\frac{2}{n}$ (this was not formally proven in Lemma 4 for the case that $v∈\{(0,n),(n,0)\}$ and the unique neighbor of $v$ is in $f(F_1)$ , but it is easy to see that in this case only the at most two $x$ with $f(x)=v$ and infinite crowding distance can have a crowding distance of at least $\frac{2}{n}$ ). Consequently, there are more than
| | $\displaystyle N-4|U_1|-2|U_2|$ | $\displaystyle={}N-4|U_1|-2(m-|U_1|)=N-2|U_1|-2m$ | |
| --- | --- | --- | --- |
individuals in $P_t∖\{y\}$ that would lose against $y$ . Note that this bound is weaker than the one from the first case, so it is valid in both cases. From this, we now estimate the probability that $y$ is selected as a parent at least once. We first regard the case of independent tournaments. The probability that $y$ is the winner of a fixed tournament is at least the probability that it is chosen as the first contestant times the probability that one of the at least $N-3.6(n+1)$ sure losers is chosen as the second contestant. This probability is at least $\frac{1}{N}·\frac{N-3.6(n+1)}{N-1}≥\frac{1}{N}\frac{0.4(n+1)}{4n+3}≥ 0 .1\frac{1}{N}$ . Hence the probability $p$ that $y$ is chosen at least once as a parent for mutation is at least $p≥ 1-(1-0.1\frac{1}{N})^N≥ 1-\exp(-0.1)≥ 0.09$ . For the two-permutation implementation of tournament selection, $y$ appears in both permutations and has a random partner in each of them. Hence the probability that $y$ wins at least one of these two tournaments is at least $p≥ 1-(1-\frac{N-3.6(n+1)}{N-1})^2≥ 1-(1-0.1)^2=0.19$ . Conditional on $y$ being selected at least once, we regard a mutation step in which $y$ is selected. The probability $p^*$ that the Pareto optimal $y$ is mutated into the unique Pareto optimal bit string $z$ with $f(z)=v$ is $p^*=\frac{1}{n}$ for one-bit mutation and $p^*=\frac{1}{n}(1-\frac{1}{n})^n-1≥\frac{1}{en}$ for standard bit-wise mutation. Consequently, the probability that one iteration generates the missing Pareto front value $v$ is at least $pp^*$ , the expected waiting time for this is at most $\frac{1}{pp^*}$ iterations, and the expected time to create all missing Pareto front values is at most
| | $\displaystyle\frac{n}{pp^*}≤\frac{n}{0.09\frac{1}{en}}=\frac{100en^2}{9}$ | |
| --- | --- | --- |
iterations. Hence, the runtime for the full coverage of the Pareto front starting from the initial population is at most
| | $\displaystyle\tfrac{e^2}{e-1}n^2+\tfrac{e^2}{e-1}n^2+\tfrac{100e}{9}n^ {2}<15en^2$ | |
| --- | --- | --- |
iterations, which is at most $15eNn^2$ fitness evaluations. Now we will prove the concentration result. Note that in this proof, we consider three phases, the first phase to reach the extremal point $(1,\dots,1)$ , the second phase to reach $(0,\dots,0)$ , and the third phase to cover the full Pareto front. The runtime for each phase is dominated by the sum of $n$ independent geometric random variables with success probabilities of $\frac{0.09}{en}$ . Formally, let $X_1,\dots,X_3n$ be independent geometric random variables with success probabilities of $\frac{0.09}{en}$ , and let $T$ be the number of iterations to cover the full Pareto front. Then we have $Z:=∑_i=1^3nX_i$ stochastically dominates $T$ , and $E[Z]=\frac{3en^2}{0.09}$ . From the Chernoff bound [Doe20, (1.10.46) in Theorem 1.10.32], we have that for any $δ≥ 0$ ,
$$
\Pr\mathopen{}\mathclose{{}≤ft[Z≥(1+δ)\frac{100e}{3}n^2}\right]
≤\exp\mathopen{}\mathclose{{}≤ft(-\frac{δ^2}{2}\frac{3n-1}{1+
δ}}\right).
$$
Since $Z$ dominates $T$ , this shows the theorem. ∎
## 5 An Exponential Lower Bound for Small Population Size
In this section, we prove a lower bound for a small population size. Since lower bound proofs can be quite complicated – recall for example that there are matching upper and lower bounds for the runtime of the SEMO (using one-bit mutation) on OneMinMax and LeadingOnesTrailingZeroes, but not for the GSEMO (using bit-wise mutation) – we restrict ourselves to the simplest variant using each parent once to generate one offspring via one-bit mutation. From the proofs, though, we are optimistic that our results, with different implicit constants, can also be shown for all other variants of the NSGA-II regarded in this work. Our experiments support this believe, see Figure 3 in Section 6.
Our main result is that this NSGA-II takes an exponential time to find the whole Pareto front (of size $n+1$ ) of OneMinMax when the population size is $n+1$ . This is different from the SEMO and GSEMO algorithms (which have no fixed population size, but which will never store a population larger than $n+1$ when optimizing OneMinMax) and the $(μ+1)$ SIBEA with population size $μ=n+1$ . Even stronger, we show that there is a constant $ε>0$ such that when the current population $P_t$ covers at least $|f(P_t)|≥(1-ε)(n+1)$ points on the Pareto front of OneMinMax, then with probability $1-\exp(-Θ(n))$ , the next population $P_t+1$ will cover at most $|f(P_t+1)|≤(1-ε)(n+1)$ points on the front. Hence when a population covers a large fraction of the Pareto front, then with very high probability the next population will cover fewer points on the front. When the coverage is smaller, that is, $|f(P_t)|≤(1-ε)(n+1)$ , then with probability $1-\exp(-Θ(n))$ the combined parent and offspring population $R_t$ will miss a constant fraction of the Pareto front. From these two statements, it is easy to see that there is a constant $δ$ such that with probability $1-\exp(-Ω(n))$ , in none of the first $\exp(Ω(n))$ iterations the combined parent and offspring population covers more than $(1-δ)(n+1)$ points of the Pareto front.
Since it is the technically easier one, we start with proving the latter statement that a constant fraction of the front not covered by $P_t$ implies also a constant fraction not covered by $R_t$ . Before stating the formal result and proof, let us explain the reason behind this result. With a constant fraction of the front not covered by $P_t$ , also a constant fraction that is $Ω(n)$ away from the boundary points $(0,n)$ and $(n,0)$ is not covered. These values have the property that from an individual corresponding to either of their neighboring positions, an individual with this objective value can only be generated with constant probability via one-bit mutation. Again a constant fraction of these values have only a constant number of individuals on neighboring positions. These values thus have a (small) constant probability of not being generated in this iteration. This shows that in expectation, we are still missing a constant fraction of the Pareto front in $R_t$ . Via the method of bounded differences (exploiting that each mutation operation can change the number of missing elements by at most one), we turn this expectation into a bound that holds with probability $1-\exp(-Ω(n))$ .
**Lemma 10**
*Let $ε∈(0,1)$ be a sufficiently small constant. Consider optimizing the OneMinMax benchmark via the NSGA-II applying one-bit mutation once to each parent individual. Let the population size be $N=n+1$ . Assume that $|f(P_t)|≤(1-ε)(n+1)$ . Then with probability at least $1-\exp(-Ω(n))$ , we have $|f(R_t)|≤(1-\frac{1}{10}ε(\tfrac{1}{5}ε-\tfrac{2}{n}) ^5/ε)(n+1)$ .*
* Proof*
Let $F=\{(v,n-v)\mid v∈[0..n]\}$ be the Pareto front of OneMinMax. For a value $(v,n-v)∈ F$ , we say that $(v-1,n-v+1)$ and $(v+1,n-v-1)$ are neighbors of $(v,n-v)$ provided that they are in $[0..n]^2$ . We write $(a,b)∼(u,v)$ to denote that $(a,b)$ and $(u,v)$ are neighbors. Let $Δ=\lceil\frac{5}{ε}\rceil-1$ and let $F^\prime$ be the set of values in $F$ such that more than $Δ$ individuals in $P_t$ have a function value that is a neighbor of this value, that is,
$$
F^\prime=\mathopen{}\mathclose{{}≤ft\{(v,n-v)∈ F \big{|} |\{x∈ P_t
\mid f(x)∼(v,n-v)\}|≥Δ+1}\right\}.
$$
Then $|F^\prime|≤\frac{2}{Δ+1}(n+1)≤\frac{2}{5}ε(n+1)$ as otherwise the number of individuals in our population could be bounded from below by
$$
|F^\prime|\tfrac{1}{2}(Δ+1)>\tfrac{2}{Δ+1}(n+1)·\tfrac{1}{2}(
Δ+1)=n+1,
$$
which contradicts our assumption $N=n+1$ (note that the factor of $\tfrac{1}{2}$ accounts for the fact that we may count each individual twice). Let $M=F∖ f(P_t)$ be the set of Pareto front values not covered by the current population. By assumption, $|M|≥ε(n+1)$ . Let
$$
M_1=\mathopen{}\mathclose{{}≤ft\{(v,n-v)∈ M \middle| v∈\mathopen{}
\mathclose{{}≤ft[\lfloor\tfrac{1}{5}ε(n+1)\rfloor..n-\lfloor\tfrac
{1}{5}ε(n+1)\rfloor}\right]}\right\}∖ F^\prime.
$$
Then $|M_1|≥|M|-2\lfloor\tfrac{1}{5}ε(n+1)\rfloor-|F^\prime|≥ \tfrac{1}{5}ε(n+1)$ . We now argue that a constant fraction of the values in $M_1$ is not generated in the current generation. We note that via one-bit mutation, a given $(v,n-v)∈ F$ can only be generated from an individual $x$ with $f(x)∼(v,n-v)$ . Let $(v,n-v)∈ M_1$ . Since $v∈[\lfloor\tfrac{1}{5}ε(n+1)\rfloor..n-\lfloor\tfrac{1}{5} ε(n+1)\rfloor]$ , the probability that a given parent $x$ is mutated to some individual $y$ with $f(y)=(v,n-v)$ is at most
| | $\displaystyle\frac{n-\lfloor\tfrac{1}{5}ε(n+1)\rfloor+1}{n}≤ 1- \frac{1}{5}ε+\frac{2}{n}$ | |
| --- | --- | --- |
since there are at most $n-\lfloor\tfrac{1}{5}ε(n+1)\rfloor+1$ bit positions such that flipping them creates the desired value. Since $v∉ F^\prime$ , the probability that $Q_t$ (and thus $R_t$ ) contains no individual $y$ with $f(y)=(v,n-v)$ , is at least
| | $\displaystyle\mathopen{}\mathclose{{}≤ft(1-\mathopen{}\mathclose{{}≤ft(1- \tfrac{1}{5}ε+\tfrac{2}{n}}\right)}\right)^Δ≥(\tfrac{1}{5} ε-\tfrac{2}{n})^5/ε:=p.$ | |
| --- | --- | --- |
Let $X=|F∖ f(R_t)|$ denote the number of Pareto front values not covered by $R_t$ . We have $E[X]≥|M_1|p≥\frac{1}{5}ε p(n+1)$ . The random variable $X$ is functionally dependent on the $N=n+1$ random decisions of the $N$ mutation operations, which are stochastically independent. Changing the outcome of a single mutation operation changes $X$ by at most $1$ . Consequently, $X$ satisfies the assumptions of the method of bounded differences [McD89] (also to be found in [Doe20, Theorem 1.10.27]). Hence the classic additive Chernoff bound applies to $X$ as if it was a sum of $N$ independent random variables taking values in an interval of length $1$ . In particular, the probability that $X≤\frac{1}{10}ε p(n+1)≤\frac{1}{2}E[X]$ is at most $\exp(-Ω(n))$ . ∎
We now turn to the other main argument, which is that when the current population covers the Pareto front to a large extent, then the selection procedure of the NSGA-II will remove individuals in such a way from $R_t$ that at least some constant fraction of the Pareto front is not covered by $P_t+1$ . The key arguments to show this claim are the following. When a large part of the front is covered by $P_t$ , then many points are only covered by a single individual (since the population size equals the size of the front). With some careful counting, we derive from this that close to two thirds of the positions on the front are covered exactly twice in the combined parent and offspring population $R_t$ and that the corresponding individuals have the same crowding distance. Since these are roughly $\frac{4}{3}(n+1)$ individuals appearing equally preferable in the selection, a random set of at least roughly $\frac{1}{3}(n+1)$ of them will be removed in the selection step. In expectation, this will remove both individuals from a constant fraction of the points on the Pareto front. Again, the method of bounded differences turns this expectation into a statement with probability $1-\exp(-Ω(n))$ .
**Lemma 11**
*Let $ε>0$ be a sufficiently small constant. Consider optimizing the OneMinMax benchmark via the NSGA-II applying one-bit mutation once to each individual. Let the population size be $N=n+1$ . Assume that the current population $P_t$ covers $|f(P_t)|≥(1-ε)(n+1)$ elements of the Pareto front. Then with probability at least $1-\exp(-Ω(n))$ , the next population $P_t+1$ covers less than $(1-0.01)(n+1)$ elements of the Pareto front.*
* Proof*
Let $U$ be the set of Pareto front values that have exactly one corresponding individual in $P_t$ , that is, for any $(v,n-v)∈ U$ , there exists only one $x∈ P_t$ with $f(x)=(v,n-v)$ . We first note that $|U|≥(1-2ε)(n+1)$ as otherwise there would be at least
| | $\displaystyle 2$ | $\displaystyle{}\mathopen{}\mathclose{{}≤ft(|f(P_t)|-|U|}\right)+|U|=2|f(P_ {t})|-|U|$ | |
| --- | --- | --- | --- |
individuals in $P_t$ , which contradicts our assumption $N=n+1$ . Let $U^\prime$ denote the set of values in $U$ which have all their neighbors also in $U$ . Since each value not in $U$ can prevent at most two values in $U$ from being in $U^\prime$ , we have
$$
\begin{split}|U^\prime|&≥{}|U|-2(n+1-|U|)=3|U|-2(n+1)\\
&≥{}3(1-2ε)(n+1)-2(n+1)=\mathopen{}\mathclose{{}≤ft(1-6
ε}\right)(n+1).\end{split}
$$
We say that $(v,n-v)$ is double-covered by $R_t$ if there are exactly two individuals in $R_t$ with function value $(v,n-v)$ . Noting that via one-bit mutation a certain function value can only be generated from the individuals corresponding to the neighbors of this function value, we see that a given $(v,n-v)∈ U^\prime$ with $v∈[1..n-1]$ is double-covered by $R_t$ with probability exactly
$$
p_v=\frac{n-(v-1)}{n}+\frac{v+1}{n}-2\frac{n-(v-1)}{n}\frac{v+1}{n}=1-\frac{
2v}{n}+2\frac{v^2-1}{n^2}.
$$
Thus the expected number of double-coverages in $U^\prime$ is at least
$$
\displaystyle∑_v∈[1..n-1]:\atop(v,n-v)∈ U^\primep_v={} \displaystyle\mathopen{}\mathclose{{}≤ft(∑_v=1^n-1p_v}\right)-
\mathopen{}\mathclose{{}≤ft(∑_v∈[1..n-1]:\atop(v,n-v)∉ U^\prime
p_v}\right) \displaystyle≥{} \displaystyle\mathopen{}\mathclose{{}≤ft(∑_v=1^n-11-\frac{2v}{n}+2
\frac{v^2-1}{n^2}}\right)-\mathopen{}\mathclose{{}≤ft(∑_v∈[1..n-1]
:\atop(v,n-v)∉ U^\prime1}\right) \displaystyle≥{} \displaystyle(n-1)-\frac{2}{n}\frac{(n-1)n}{2}+\frac{2}{n^2}\mathopen{}
\mathclose{{}≤ft(\frac{(n-1)n(2(n-1)+1)}{6}-(n-1)}\right) \displaystyle{}-6ε(n+1) \displaystyle={} \displaystyle\frac{n-1}{n^2}\frac{2n^2-n-6}{3}-6ε(n+1)=(\tfrac{2
}{3}-6ε)(n+1)-O(1). \tag{1}
$$ Denote by $U^\prime\prime$ the set of values in $U^\prime$ that are double-covered by $R_t$ and note that we have just shown $E[|U^\prime\prime|]≥(\frac{2}{3}-6ε)(n+1)-O(1)$ . The number $m:=|U^\prime\prime|$ of double-covered elements is functionally dependent on the random decisions taken (independently) in the $N$ mutation operations. Each mutation operation determines one offspring and thus can change the number of double-covered values by at most $2$ . Consequently, we can use the method of bounded differences [McD89] and obtain that $|U^\prime\prime|$ is at least $(\tfrac{2}{3}-8ε)(n+1)$ with probability at least $1-\exp(-Ω(n))$ . We condition on this in the remainder. Our next argument is that these double-coverages correspond to approximately $\frac{4}{3}(n+1)$ individuals in $R_t$ that have the same crowding distance. Consequently, the selection procedure has to discard at least roughly $\frac{1}{3}(n+1)$ of them, randomly chosen, and this will lead to a decent number of values in $U^\prime\prime$ that are not covered anymore by $P_t+1$ . To make this precise, let $R^\prime\prime$ denote the individuals $x$ in $R_t$ such that $f(x)∈ U^\prime\prime$ . By construction, there are exactly two such individuals for each value in $U^\prime\prime$ , hence $|R^\prime\prime|=2m$ . Further, both neighboring values are also present in $f(R_t)$ . Consequently, each $x∈ R^\prime\prime$ has crowding distance (in $R_t$ ) exactly $d=\frac{1}{v_1^\max-v_1^\min}+\frac{1}{v_2^\max-v_2^\min}$ . We recall that the selection procedure (since all ranks are equal to one) first discards all individuals with crowding distance less than $d$ since these are at most $|R_t|-|R^\prime\prime|≤ 2(n+1-\tilde{m})=(2-\frac{4}{3}+16ε)( n+1)+O(1)$ , which is less than $N$ for $n$ large and $ε$ small enough. Then, randomly, the selection procedure discards a further number of individuals from all individuals with crowding distance exactly $d$ so that exactly $N$ individuals remain. For $N$ individuals to remain, we need that at least $k:=|R^\prime\prime|-N$ individuals from $R^\prime\prime$ are discarded. To ease the calculation, we first reduce the problem to the case that $|R^\prime\prime|=2\tilde{m}$ . Indeed, let $U^\prime\prime\prime$ be any subset of $U^\prime\prime$ having cardinality exactly $\tilde{m}$ and let $R^\prime\prime\prime$ be the set of individuals $x∈ R_t$ with $f(x)∈ U^\prime\prime\prime$ . Then $R^\prime\prime\prime⊆ R^\prime\prime$ and $|R^\prime\prime\prime|=2\tilde{m}$ . With the same argument as in the previous paragraph we see that the selection procedure has to remove at least $\tilde{k}:=2\tilde{m}-N$ elements from $R^\prime\prime\prime$ . We thus analyze the number of elements of $U^\prime\prime\prime$ that become uncovered when we remove a random set of $\tilde{k}$ individuals from $R^\prime\prime\prime$ , knowing that this is a lower bound for the number of elements uncovered in $U^\prime\prime$ , both because the number of individuals removed from $R^\prime\prime\prime$ can be higher than $\tilde{k}$ and because the removal of elements in $R^\prime\prime∖ R^\prime\prime\prime$ can also lead to uncovered elements in $U^\prime\prime$ . We take a final pessimistic simplification, and this is that we select $\tilde{k}$ elements from $R^\prime\prime\prime$ with replacement and remove these individuals from $R^\prime\prime\prime$ . Clearly, this can only lower the number of removed elements, hence our estimate for the number of uncovered elements is also valid for the random experiment without replacement (where we choose exactly $\tilde{k}$ elements to be removed). For this random experiment the probability for uncovering a position in $U^\prime\prime\prime$ is at least
| | $\displaystyle 1-{}$ | $\displaystyle{}2\mathopen{}\mathclose{{}≤ft(1-\frac{1}{2\tilde{m}}}\right)^ \tilde{k}+\mathopen{}\mathclose{{}≤ft(1-\frac{1}{2\tilde{m}}}\right)^2 \tilde{k}$ | |
| --- | --- | --- | --- |
where we used the estimate $\frac{\tilde{k}}{2\tilde{m}}=1-\frac{n+1}{2\tilde{m}}=1-\frac{3}{4}\frac{1}{1- 12ε}$ and the fact that $\tilde{m}=Θ(n)$ . Let $Y$ denote the number of elements of $U^\prime\prime\prime$ uncovered in our random experiment. We note that $1-2\exp(-1/4)+\exp(-1/2)≥ 0.04892$ . Hence when $n$ is large enough and $ε$ was chosen as a sufficiently small constant, then
$$
E[Y]=p\tilde{m}≥ 0.02(n+1).
$$
The random variable $Y$ is functionally dependent on the $\tilde{k}$ selected individuals, which are stochastically independent. Changing the outcome of a single selected individual changes $Y$ by at most $1$ . Consequently, $Y$ satisfies the assumptions of the method of bounded differences [McD89]. The classic additive Chernoff bound thus applies to $Y$ as if it was a sum of $k=Ω(n)$ independent random variables taking values in an interval of length $1$ . In particular, the probability that $Y≤ 0.01(n+1)≤\frac{1}{2}E[Y]$ is at most $\exp(-Ω(n))$ . ∎
Combining Lemmas 10 and 11, we have the following exponential runtime result.
**Theorem 12**
*Consider optimizing OneMinMax via the NSGA-II applying one-bit mutation once to each individual. Let the population size be $N=n+1$ . There are a positive constant $γ$ and a time $T=\exp(Ω(n))$ such that with probability $1-\exp(-Ω(n))$ , in each of the first $T$ iterations at most a fraction of $1-γ$ of the Pareto front is covered by $P_t$ .*
* Proof*
Let $ε$ be a small constant rendering the claims of Lemmas 10 and 11 valid. Assume that $n$ is sufficiently large. Let $\tilde{ε}=(\frac{1}{10}ε)^5/ε+1$ . By a simple Chernoff bound, we note that a random initial individual $x$ satisfies $\frac{1}{4}n≤ f_1(x)≤\frac{3}{4}n$ with probability $1-\exp(Ω(n))$ . Taking a union bound over the $n+1$ initial individuals, we see that the initial population $P_0$ with probability $1-\exp(-Ω(n))$ covers at most half of the Pareto front. Let $t$ be some iteration. If $|f(P_t)|≥(1-ε)(n+1)$ , then by Lemma 11 with probability $1-\exp(-Ω(n))$ the next population $P_t+1$ covers less than $(1-0.01)(n+1)$ values of the Pareto front. If $|f(P_t)|≤(1-ε)(n+1)$ , then by Lemma 10 with probability $1-\exp(-Ω(n))$ we have $n+1-|f(P_t+1)|≥\frac{1}{10}ε(\frac{1}{5}ε-\tfrac{2}{n })^5/ε(n+1)≥\tilde{ε}(n+1)$ , where the last estimate holds when $n$ is sufficiently large. Consequently, for each generation $t$ , the probability that $P_t$ covers more than $(1-\min\{\tilde{ε},0.01\})(n+1)$ values of the Pareto front, is only $\exp(-Ω(n))$ . In particular, a union bound shows that for $T=\exp(Θ(n))$ suitably chosen, with probability $1-\exp(-Ω(n))$ in all of the first $T$ iterations, the population covers at most $(1-\min\{\tilde{ε},0.01\})(n+1)$ values of the Pareto front. ∎
## 6 Experiments
To complement our asymptotic results with runtime data for concrete problem sizes, we conducted the following experiments.
### 6.1 Settings
We use, in principle, the version of the NSGA-II given by Deb (Revision 1.1.6), available at [Deb], except that, as in our theoretical analysis, we do not use crossover. We re-implemented the algorithm in Matlab (R2016b). When a sorting procedure is used, we use the one provided by Matlab (and not randomized Quicksort as in Deb’s implementation). The code is available at [Zhe].
Our theoretical analysis above covers four parent selection strategies and two mutation operators. In the interest of brevity, with the exception of the data presented in Figure 3 we concentrate in our experiments on one variant of the algorithm, namely we use two-permutation binary tournament selection (as proposed in [DPAM02]) and standard bit-wise mutation with mutation rate $\frac{1}{n}$ (which is the most common mutation operator in evolutionary computation). We use the following experimental settings.
- Problem size $n$ : $100,200,300,$ and $400$ for OneMinMax, and $30,60,90,$ and $120$ for LeadingOnesTrailingZeroes.
- Population size $N$ : Our theoretical analyses (Theorems 6 and 9) showed that the NSGA-II find the optima of OneMinMax and LeadingOnesTrailingZeroes efficiently for population sizes of at least $N^*=4(n+1)$ . We use this value also in the experiments. We also use the value $N=2N^*$ , for which our theory results apply, but our runtime guarantees are twice as large as for $N^*$ (when making the implicit constants in the results visible). We also use the smaller population sizes $2(n+1)$ and $1.5(n+1)$ for OneMinMax and $2(n+1)$ for LeadingOnesTrailingZeroes. For these values, we have no proven result, but it is not uncommon that mathematical runtime analyses cannot cover all efficient parameter setting, and in fact, we shall observe a good performance in these experiments as well (the reason why we do not display results for $N=1.5(n+1)$ for LeadingOnesTrailingZeroes is that here indeed the algorithm was not effective anymore). Finally, we conduct experiments with the population size $N=n+1$ , which is large enough to represent the full Pareto front, but for which we have proven the NSGA-II to be ineffective (on OneMinMax and when letting each parent create an offspring via one-bit mutation).
- Number of independent runs: $50$ for the efficient population sizes in Section 6.2 and $20$ for more time-consuming experiments with inefficient population sizes in Sections 6.3 to 6.4. These numbers of independent runs have already shown good concentrations.
### 6.2 Efficient Population Sizes
Figure 1 displays the runtime (that is, the number of fitness evaluations until the full Pareto front is covered) of the NSGA-II with population sizes large enough to allow an efficient optimization, together with the runtime of the (parameter-less) GSEMO.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Line Chart: Runtime for solving OneMinMax
### Overview
The image is a line chart comparing the computational runtime (measured in fitness evaluations) of five different algorithms or algorithm configurations for solving the "OneMinMax" problem. The runtime is plotted against the problem size, denoted by `n`. All data series show a clear increasing trend, with runtime growing as `n` increases. The chart uses a logarithmic scale for the y-axis.
### Components/Axes
* **Title:** "Runtime for solving OneMinMax" (centered at the top).
* **X-Axis:**
* **Label:** `n` (centered below the axis).
* **Scale:** Linear scale with major tick marks and labels at `100`, `200`, `300`, and `400`.
* **Y-Axis:**
* **Label:** "Fitness Evaluations" (rotated 90 degrees, left side).
* **Scale:** Logarithmic scale (base 10). Major tick marks and labels are at `10^5` and `10^6`. Minor tick marks are present between these major values.
* **Legend:** Located in the top-left corner of the plot area. It contains five entries, each with a colored line segment, a marker, and a text label.
1. **Label:** `GSEMO` | **Color/Marker:** Dark blue line with a plus (`+`) marker.
2. **Label:** `NSGA-II, N=1.5(n+1)` | **Color/Marker:** Yellow-orange line with a plus (`+`) marker.
3. **Label:** `NSGA-II, N=2(n+1)` | **Color/Marker:** Light green line with a plus (`+`) marker.
4. **Label:** `NSGA-II, N=4(n+1)` | **Color/Marker:** Orange-red line with a plus (`+`) marker.
5. **Label:** `NSGA-II, N=8(n+1)` | **Color/Marker:** Dark green line with a plus (`+`) marker.
* **Data Series:** Five lines, each with error bars (vertical lines with caps) at the data points for `n=100, 200, 300, 400`.
### Detailed Analysis
The following table reconstructs the approximate data points for each algorithm. Values are estimated from the logarithmic y-axis. The trend for all series is a roughly linear increase on this log-linear plot, indicating an exponential relationship between `n` and the number of fitness evaluations.
| Algorithm (Legend Label) | Color | Approx. Fitness Evaluations at n=100 | Approx. Fitness Evaluations at n=200 | Approx. Fitness Evaluations at n=300 | Approx. Fitness Evaluations at n=400 | Visual Trend Description |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| **GSEMO** | Dark Blue | ~9 x 10^4 | ~4 x 10^5 | ~1.2 x 10^6 | ~2.5 x 10^6 | Steep, consistent upward slope. |
| **NSGA-II, N=1.5(n+1)** | Yellow-Orange | ~5 x 10^4 | ~2 x 10^5 | ~5 x 10^5 | ~1 x 10^6 | Shallowest slope among the series. Lowest runtime at all points. |
| **NSGA-II, N=2(n+1)** | Light Green | ~7 x 10^4 | ~3 x 10^5 | ~7 x 10^5 | ~1.2 x 10^6 | Slope is steeper than N=1.5 but less steep than others. |
| **NSGA-II, N=4(n+1)** | Orange-Red | ~1 x 10^5 | ~6 x 10^5 | ~1.2 x 10^6 | ~2.5 x 10^6 | Very steep slope, similar to GSEMO. Intersects with GSEMO line near n=300. |
| **NSGA-II, N=8(n+1)** | Dark Green | ~2 x 10^5 | ~9 x 10^5 | ~2 x 10^6 | ~4 x 10^6 | Steepest slope overall. Highest runtime at all points. |
**Error Bars:** All data points have vertical error bars, indicating variability in the measured runtime. The length of the error bars appears relatively consistent across the series for a given `n`, though they are more pronounced on the logarithmic scale at higher values.
### Key Observations
1. **Clear Hierarchy:** There is a consistent performance hierarchy across all problem sizes (`n`). From fastest (lowest evaluations) to slowest: `NSGA-II, N=1.5(n+1)` < `NSGA-II, N=2(n+1)` < `GSEMO` ≈ `NSGA-II, N=4(n+1)` < `NSGA-II, N=8(n+1)`.
2. **Impact of N:** For the NSGA-II algorithm, increasing the population size parameter `N` (from 1.5(n+1) to 8(n+1)) results in a significant increase in runtime (fitness evaluations).
3. **GSEMO vs. NSGA-II:** The GSEMO algorithm's performance is comparable to NSGA-II with `N=4(n+1)`. Their lines are very close and intersect around `n=300`.
4. **Scaling Behavior:** All algorithms exhibit what appears to be exponential scaling with respect to `n`, as indicated by the roughly straight lines on the log-linear plot. The slope of the line (the exponent) varies between algorithms.
### Interpretation
This chart demonstrates the computational cost scaling of different evolutionary algorithms on the OneMinMax benchmark problem. The key insight is the trade-off between algorithm configuration and runtime.
* **Population Size Matters:** For NSGA-II, a larger population size (`N`) leads to dramatically higher computational cost. The `N=8(n+1)` variant is approximately an order of magnitude more expensive than the `N=1.5(n+1)` variant at `n=400`. This suggests that while larger populations might offer other benefits (like solution quality or diversity), they come at a steep runtime penalty for this problem.
* **Algorithm Comparison:** GSEMO, a multi-objective evolutionary algorithm based on a steady-state genetic algorithm and a archive, shows a runtime profile similar to a moderately configured NSGA-II (`N=4(n+1)`). This provides a point of reference for comparing different algorithmic paradigms.
* **Practical Implication:** For large problem sizes (`n`), the choice of algorithm and its parameters has a massive impact on feasibility. Using the most efficient configuration shown (`NSGA-II, N=1.5(n+1)`) could mean solving a problem with `n=400` in roughly 1 million evaluations, whereas the least efficient (`NSGA-II, N=8(n+1)`) would require around 4 million evaluations—a fourfold increase in computational resources.
* **Underlying Pattern:** The linear trend on the log plot suggests the runtime `R` can be modeled as `R ≈ a * b^n`, where `b` is a base greater than 1. The different slopes indicate different values of `b` for each algorithm configuration, quantifying their relative scalability.
</details>
(a)
<details>
<summary>x2.png Details</summary>

### Visual Description
## Line Chart: Runtime for solving LOTZ
### Overview
The image is a line chart comparing the computational runtime (measured in fitness evaluations) of four different algorithms or algorithm configurations for solving the "LOTZ" problem as the problem size parameter `n` increases. The chart uses a logarithmic scale for the y-axis.
### Components/Axes
* **Title:** "Runtime for solving LOTZ" (centered at the top).
* **Y-axis:** Label is "Fitness Evaluations". The scale is logarithmic, with major tick marks at `10^5` and `10^6`. Minor tick marks are present between these values.
* **X-axis:** Label is "n". The scale is linear with major tick marks at the values `30`, `60`, `90`, and `120`.
* **Legend:** Located in the top-left corner of the plot area. It contains four entries:
1. **GSEMO:** Represented by a dark blue line with plus-sign (`+`) markers.
2. **NSGA-II, N=2(n+1):** Represented by a light green line with plus-sign (`+`) markers.
3. **NSGA-II, N=4(n+1):** Represented by an orange line with plus-sign (`+`) markers.
4. **NSGA-II, N=8(n+1):** Represented by a dark green line with plus-sign (`+`) markers.
* **Data Series:** Four lines, each with data points at `n = 30, 60, 90, 120`. Each data point includes vertical error bars.
### Detailed Analysis
All four data series show a clear, consistent upward trend on the log-linear plot, indicating that the number of fitness evaluations increases exponentially with the problem size `n`.
**Approximate Data Points (Fitness Evaluations):**
* **At n = 30:**
* GSEMO (Dark Blue): ~2.5 x 10^4
* NSGA-II, N=2(n+1) (Light Green): ~1.8 x 10^4 (lowest)
* NSGA-II, N=4(n+1) (Orange): ~2.2 x 10^4
* NSGA-II, N=8(n+1) (Dark Green): ~3.0 x 10^4 (highest)
* **At n = 60:**
* GSEMO (Dark Blue): ~1.5 x 10^5
* NSGA-II, N=2(n+1) (Light Green): ~1.0 x 10^5 (lowest)
* NSGA-II, N=4(n+1) (Orange): ~1.3 x 10^5
* NSGA-II, N=8(n+1) (Dark Green): ~1.8 x 10^5 (highest)
* **At n = 90:**
* GSEMO (Dark Blue): ~5.0 x 10^5
* NSGA-II, N=2(n+1) (Light Green): ~3.0 x 10^5 (lowest)
* NSGA-II, N=4(n+1) (Orange): ~4.0 x 10^5
* NSGA-II, N=8(n+1) (Dark Green): ~6.0 x 10^5 (highest)
* **At n = 120:**
* GSEMO (Dark Blue): ~1.0 x 10^6
* NSGA-II, N=2(n+1) (Light Green): ~7.0 x 10^5 (lowest)
* NSGA-II, N=4(n+1) (Orange): ~1.0 x 10^6
* NSGA-II, N=8(n+1) (Dark Green): ~1.2 x 10^6 (highest)
**Error Bars:** All data points have vertical error bars indicating variability in the measurements. The approximate range of the error bars is consistent across points for a given series, spanning roughly ±15-25% of the central value.
### Key Observations
1. **Consistent Hierarchy:** The performance order of the algorithms is consistent across all measured values of `n`. From most efficient (fewest evaluations) to least efficient: `NSGA-II, N=2(n+1)` < `GSEMO` ≈ `NSGA-II, N=4(n+1)` < `NSGA-II, N=8(n+1)`.
2. **Exponential Growth:** The straight lines on the log-linear plot confirm that runtime (fitness evaluations) grows exponentially with `n` for all tested methods.
3. **Impact of Population Size (N):** For the NSGA-II algorithm, increasing the population size multiplier (from 2 to 4 to 8) consistently increases the computational cost. The `N=8(n+1)` configuration is the most expensive.
4. **GSEMO vs. NSGA-II:** The GSEMO algorithm's performance lies between the `N=2(n+1)` and `N=4(n+1)` configurations of NSGA-II. It is less efficient than the smallest-population NSGA-II but more efficient than the larger-population variants.
### Interpretation
This chart demonstrates the scalability of different evolutionary algorithms on the LOTZ benchmark problem. The key insight is that while all methods face exponential growth in runtime with problem size, their constant factors differ significantly.
The data suggests a trade-off within the NSGA-II framework: using a larger population (`N`) likely improves solution quality or convergence reliability (not shown here) but comes at a direct, proportional cost in fitness evaluations. The most parsimonious configuration (`N=2(n+1)`) is the most runtime-efficient.
The GSEMO algorithm presents an interesting middle ground. Its performance being bracketed by two NSGA-II variants suggests its search dynamics have an effective "exploration-exploitation" balance that results in a computational cost similar to an NSGA-II run with a population size between `2(n+1)` and `4(n+1)`.
The consistent exponential trend implies that for very large `n`, the absolute difference in runtime between these methods will become enormous, making algorithm selection critical for large-scale problems. The chart does not show which method finds better solutions, only their relative computational cost.
</details>
(b)
Figure 1: The number of function evaluations for the NSGA-II (binary tournament selection, standard bit-wise mutation) with different population sizes and for the GSEMO optimizing OneMinMax (1a) and LeadingOnesTrailingZeroes (1b). Displayed are the median (with $1$ st and $3$ rd quartiles) in 50 independent runs.
This data confirms that the NSGA-II can efficiently cover the Pareto front of OneMinMax and LeadingOnesTrailingZeroes when using a population size of at least $N^*$ . The runtimes for $N=2N^*$ are clearly larger than for $N^*$ , but by a factor slightly less than $2$ for both problems. The data for the population sizes smaller than $N^*$ indicates that also for these parameter settings the NSGA-II performs very well.
Comparing the NSGA-II to the GSEMO, we observe that the NSGA-II with a proper choice of the population size shows a better performance. This is interesting and somewhat unexpected, in particular, for simple problems like OneMinMax and LeadingOnesTrailingZeroes. It is clear that the NSGA-II using tournament selection chooses extremal parents with higher rate. More precisely, each individual appears twice in a tournament. For an extremal value on the Pareto front, at least one individual has an infinite crowding distance, making it the tournament winner almost surely (except in the rare case that the tournament partner has infinite crowding distance as well). Consequently, for each extremal objective value, the NSGA-II mutates at least $2-o(1)$ individuals per iteration. This is twice the average rate. In contrast, the GSEMO treats all individuals equally. This advantage of the NSGA-II comes at the price of a larger population, hence a larger cost per iteration. We note that the NSGA-II throughout the run works with a population of size $N$ , whereas the GSEMO only keeps non-dominated individuals in its population. Consequently, in particular in the early stages of the optimization process, each iteration takes significantly fewer fitness evaluations.
### 6.3 Inefficient Population Sizes
When the population size is small, we do not have the result that points on the front cannot be lost (Lemmas 1 and 7) and the proof of Theorem 12 shows that indeed we can easily lose points on the front, leading to a runtime at least exponential in $n$ when $N=n+1$ . In this subsection, we analyze this phenomenon experimentally. As discussed earlier, we first concentrate on the NSGA-II with two-permutation tournament selection and standard bit-wise mutation.
Since it is hard to show an exponential runtime experimentally, we do not run the algorithm until it found the full Pareto front (this would be possible only for very small problem sizes), but we conduct a slightly different experiment for reasonable problem sizes which also strongly indicates that the NSGA-II has enormous difficulties in finding the full front. We ran the NSGA-II for $3000$ generations for OneMinMax and $5000$ generations for LeadingOnesTrailingZeroes and measured for each generation the ratio by which the Pareto front is covered. This data is displayed in Figure 2. We see clearly that the coverage of the Pareto front steeply increases at first, but then stagnates at a constant fraction clearly below one (around $80$ % for OneMinMax and between 50% and 60% for LeadingOnesTrailingZeroes) and this in a very concentrated manner. From this data, there is no indication that the Pareto front will be covered anytime soon.
<details>
<summary>x3.png Details</summary>

### Visual Description
\n
## Line Chart: Ratios of the current Pareto front size for solving OneMinMax
### Overview
This is a line chart with error bars, plotting the ratio of the current Pareto front size against the number of generations for an optimization problem called "OneMinMax." The chart compares the performance across four different problem sizes, denoted by `n`. The data shows a rapid initial increase in the ratio, followed by a plateau, with smaller problem sizes converging faster.
### Components/Axes
* **Title:** "Ratios of the current Pareto front size for solving OneMinMax"
* **X-Axis:** Labeled "Generations". The scale runs from 0 to 3000, with major tick marks at intervals of 500 (0, 500, 1000, 1500, 2000, 2500, 3000).
* **Y-Axis:** Labeled "Ratios". The scale runs from 0.1 to 0.9, with major tick marks at intervals of 0.1.
* **Legend:** Positioned in the middle-right area of the chart. It defines four data series:
* `n=100`: Dark blue line with error bars.
* `n=200`: Yellow line with error bars.
* `n=300`: Green line with error bars.
* `n=400`: Orange-red line with error bars.
* **Data Series:** Each series is represented by a solid line connecting data points, with vertical error bars at each point indicating variability or confidence intervals.
### Detailed Analysis
**Trend Verification & Data Points (Approximate):**
All four series follow the same general trend: a steep, concave-down increase from generation 0, transitioning to a near-horizontal plateau. The rate of initial increase is inversely related to `n`.
1. **n=100 (Dark Blue):**
* **Trend:** The fastest initial rise. Reaches the plateau region first.
* **Key Points:** Starts at ~0.15 (Gen 0). Crosses 0.7 by ~Gen 100. Reaches ~0.8 by ~Gen 300. Plateaus around 0.80-0.82 from Gen 500 onward.
2. **n=200 (Yellow):**
* **Trend:** Slower initial rise than n=100, but faster than n=300 and n=400.
* **Key Points:** Starts at ~0.12 (Gen 0). Crosses 0.7 by ~Gen 200. Reaches ~0.8 by ~Gen 500. Plateaus around 0.80-0.81 from Gen 750 onward.
3. **n=300 (Green):**
* **Trend:** Slower initial rise than n=200.
* **Key Points:** Starts at ~0.11 (Gen 0). Crosses 0.7 by ~Gen 300. Reaches ~0.8 by ~Gen 750. Plateaus around 0.80 from Gen 1000 onward.
4. **n=400 (Orange-Red):**
* **Trend:** The slowest initial rise.
* **Key Points:** Starts at ~0.10 (Gen 0). Crosses 0.7 by ~Gen 400. Reaches ~0.8 by ~Gen 1000. Plateaus around 0.79-0.80 from Gen 1250 onward.
**Error Bars:** The error bars are relatively consistent in size across the plateau region for all series, suggesting stable variance in the ratio after convergence. The bars for `n=100` appear slightly larger in the early generations (0-500) compared to later ones.
### Key Observations
1. **Convergence to a Common Limit:** All four problem sizes (`n=100, 200, 300, 400`) converge to a very similar ratio value, approximately **0.80**, in the long run (after ~1250 generations).
2. **Scalability Trend:** The number of generations required to reach the 0.8 ratio threshold increases with problem size `n`. The relationship appears roughly linear: `n=100` (~300 gen), `n=200` (~500 gen), `n=300` (~750 gen), `n=400` (~1000 gen).
3. **Initial Condition:** The starting ratio at generation 0 decreases slightly as `n` increases (from ~0.15 for n=100 to ~0.10 for n=400).
4. **Plateau Stability:** Once the plateau is reached, the ratio remains remarkably stable with only minor fluctuations for the remainder of the 3000 generations shown.
### Interpretation
This chart demonstrates the performance of an evolutionary or genetic algorithm applied to the "OneMinMax" problem, a common benchmark in multi-objective optimization. The "Pareto front size ratio" likely measures how close the algorithm's found solutions are to the theoretical maximum number of non-dominated solutions.
The key insight is that the algorithm is **effective and scalable**. It consistently finds a high proportion (≈80%) of the maximum possible Pareto front size for all tested problem scales. The primary cost of scaling the problem (increasing `n`) is a **proportional increase in computational effort** (generations needed), not a degradation in final solution quality. The rapid initial improvement suggests the algorithm quickly discovers the broad structure of the Pareto front, with later generations devoted to refinement and maintaining diversity. The consistent error bars indicate the algorithm's performance is reliable across different runs. The convergence to the same limit implies the algorithm's effectiveness is not fundamentally limited by problem size within this range.
</details>
(a)
<details>
<summary>x4.png Details</summary>

### Visual Description
## Line Chart: Ratios of the Current Pareto Front Size for Solving LOTZ
### Overview
The image is a line chart displaying the performance of an evolutionary algorithm or optimization process over time (measured in generations). It plots the ratio of the current Pareto front size against the number of generations for four different problem sizes or parameter settings (n=30, 60, 90, 120). The chart includes error bars for each data point, indicating variability or confidence intervals.
### Components/Axes
* **Title:** "Ratios of the current Pareto front size for solving LOTZ"
* **X-Axis:** Labeled "**Generations**". It has a linear scale from 0 to 5000, with major tick marks and labels at 0, 1000, 2000, 3000, 4000, and 5000.
* **Y-Axis:** Labeled "**Ratios**". It has a linear scale from 0 to 0.7, with major tick marks and labels at 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7.
* **Legend:** Located in the bottom-right quadrant of the chart area. It defines four data series:
* `n=30`: Dark blue line with error bars.
* `n=60`: Yellow line with error bars.
* `n=90`: Green line with error bars.
* `n=120`: Orange line with error bars.
### Detailed Analysis
**Trend Verification & Data Points (Approximate):**
1. **Series n=30 (Dark Blue):**
* **Trend:** Exhibits the fastest initial growth, rising almost vertically from 0. It reaches a plateau very quickly and remains stable with minor fluctuations.
* **Key Points:** Starts at (0, 0). By generation ~200, it reaches a ratio of ~0.58. It plateaus between approximately 0.58 and 0.60 for the remainder of the chart (up to generation 5000). Error bars are relatively small throughout.
2. **Series n=60 (Yellow):**
* **Trend:** Begins its ascent later than n=30. It shows a steep, steady increase before leveling off at a slightly lower ratio.
* **Key Points:** Begins rising around generation ~100. Reaches a ratio of ~0.55 by generation ~500. Plateaus in the range of approximately 0.55 to 0.57 from generation 1000 onward. Error bars are moderate.
3. **Series n=90 (Green):**
* **Trend:** Starts rising even later. Its growth curve is less steep than n=60, and it stabilizes at a marginally lower ratio.
* **Key Points:** Begins rising around generation ~200. Reaches a ratio of ~0.55 by generation ~1000. Plateaus in the range of approximately 0.54 to 0.56 from generation 1500 onward. Error bars are similar in magnitude to n=60.
4. **Series n=120 (Orange):**
* **Trend:** Has the slowest onset of growth and the least steep slope. It converges to the lowest final ratio among the four series.
* **Key Points:** Begins rising around generation ~300. Reaches a ratio of ~0.53 by generation ~1500. Plateaus in the range of approximately 0.52 to 0.54 from generation 2000 onward. This series displays the largest error bars, indicating the highest variability in performance.
**Spatial Grounding:** All four lines originate at (0,0). The order of convergence speed from fastest to slowest is n=30, n=60, n=90, n=120. The final plateau height follows the same order, with n=30 being the highest and n=120 the lowest. The legend is positioned in the lower right, not overlapping the main data trends.
### Key Observations
1. **Inverse Relationship Between 'n' and Performance:** There is a clear pattern where increasing the parameter 'n' (likely representing problem size, dimensionality, or complexity) results in slower convergence and a lower final Pareto front size ratio.
2. **Convergence Behavior:** All series exhibit a similar two-phase pattern: a rapid initial improvement phase followed by a long plateau phase where the ratio stabilizes.
3. **Variability Increases with 'n':** The size of the error bars visibly increases with 'n'. The n=120 series has the most substantial error bars, suggesting that performance becomes less consistent or more sensitive to initial conditions as the problem complexity grows.
4. **Asymptotic Limits:** The ratios do not approach 1.0 (a perfect Pareto front). Instead, they appear to asymptote to values between ~0.52 and ~0.60, indicating a fundamental limit to the algorithm's effectiveness on the LOTZ problem for the given settings.
### Interpretation
This chart likely illustrates the scalability and effectiveness of a multi-objective evolutionary algorithm on the "Leading Ones Trailing Zeros" (LOTZ) benchmark problem. The "Pareto front size ratio" is a metric of solution quality, measuring how close the algorithm's obtained front is to the theoretical optimal front.
The data suggests that as the problem complexity (n) increases, the algorithm struggles more. It takes longer to find good solutions (slower convergence) and ultimately settles for solutions that are further from the optimum (lower final ratio). The increased variability (larger error bars) for higher 'n' implies the algorithm's performance becomes less reliable and more dependent on stochastic factors.
The fact that all runs plateau well below a ratio of 1.0 indicates that for this specific algorithm configuration, the LOTZ problem presents a significant challenge, and the algorithm consistently gets trapped in local optima or cannot fully explore the objective space to find the complete true Pareto front. The chart effectively communicates the trade-off between problem scale and achievable solution quality for this optimization method.
</details>
(b)
Figure 2: Ratio of the coverage of the Pareto front by the current population of the NSGA-II (binary tournament selection, standard bit-wise mutation) with population size $N=n+1$ for solving OneMinMax (2a) and LeadingOnesTrailingZeroes (2b). Displayed are the median (with $1$ st and $3$ rd quartiles) in 20 independent runs.
We said in Section 5 that we were optimistic that our negative result for small population size would also hold for all other variants of the NSGA-II. To experimentally support this claim, we now run all variants of the NSGA-II discussed in this work on OneMinMax with problem size $n=200$ , $20$ times for $3000$ iterations. In Figure 3, we see the ratios of the coverage of the Pareto front by the populations in the $20$ runs and in iterations $[2001..3000]$ (that is, we regard together $20*1000$ populations). We see that all variants fail to cover a constant fraction of the Pareto. The precise constant is different for each variant. Most notable, we observe that the variants using standard bit-wise mutation cover the Pareto front to a lesser extent than those building on one-bit mutation. We do not have a clear explanation for this phenomenon, but we speculate that standard bit-wise mutation is harmed by its constant fraction of mutations that just create a copy of the parent. We would, however, not interpret the results in this figure as a suggestion to prefer one-bit mutation. As shown in [DQ23a], with high probability the NSGA-II using one-bit mutation fails to find the Pareto front of the OneJumpZeroJump benchmark, regardless of the runtime allowed.
<details>
<summary>x5.png Details</summary>

### Visual Description
## Chart: Ratios of the Pareto front size for solving OneMinMax (n=200)
### Overview
This is a scatter plot with error bars comparing the performance of eight different algorithms on the "OneMinMax" problem with a problem size (n) of 200. The performance metric is a ratio related to the size of the Pareto front. The chart displays a central value (likely a mean or median) and an associated range of variability (error bars) for each algorithm.
### Components/Axes
* **Chart Title:** "Ratios of the Pareto front size for solving OneMinMax (n=200)"
* **Y-Axis:**
* **Label:** "Ratios"
* **Scale:** Linear, ranging from 0.76 to 0.88. Major tick marks are at intervals of 0.02 (0.76, 0.78, 0.80, 0.82, 0.84, 0.86, 0.88).
* **X-Axis:**
* **Label:** "Algorithms"
* **Categories (from left to right):** Aa, Ab, Ba, Bb, Ca, Cb, Da, Db.
* **Data Series:** A single series represented by green circular markers with vertical error bars. There is no separate legend, as the x-axis labels directly identify each data point.
### Detailed Analysis
The following table reconstructs the data presented in the chart. Values are approximate visual estimates from the plot.
| Algorithm | Central Ratio (Approx.) | Error Bar Range (Approx.) | Visual Trend Description |
| :--- | :--- | :--- | :--- |
| **Aa** | 0.830 | 0.820 to 0.840 | Positioned in the upper-middle range. |
| **Ab** | 0.785 | 0.770 to 0.800 | The lowest central value on the chart. |
| **Ba** | 0.850 | 0.835 to 0.860 | High central value, second only to Ca. |
| **Bb** | 0.800 | 0.785 to 0.815 | Lower-middle range performance. |
| **Ca** | 0.855 | 0.845 to 0.865 | The highest central value on the chart. |
| **Cb** | 0.810 | 0.795 to 0.825 | Middle-range performance. |
| **Da** | 0.845 | 0.835 to 0.855 | High central value, comparable to Ba. |
| **Db** | 0.800 | 0.785 to 0.815 | Lower-middle range, very similar to Bb. |
**Spatial Grounding:** All data points and their error bars are aligned vertically above their respective x-axis category labels. The error bars extend symmetrically above and below the central marker.
### Key Observations
1. **Performance Hierarchy:** Algorithm **Ca** achieves the highest central ratio (~0.855), followed closely by **Ba** (~0.850) and **Da** (~0.845). Algorithm **Ab** has the lowest central ratio (~0.785).
2. **Clustering:** The algorithms appear to form three loose performance clusters:
* **High Performers:** Ca, Ba, Da (central ratios > 0.84).
* **Mid Performers:** Aa, Cb (central ratios ~0.81-0.83).
* **Lower Performers:** Ab, Bb, Db (central ratios ≤ 0.80).
3. **Variability:** The length of the error bars (representing variability or confidence intervals) is relatively consistent across all algorithms, spanning approximately 0.015 to 0.020 in ratio units. No single algorithm shows dramatically higher or lower variance than the others.
4. **Pairing Similarity:** Algorithms **Bb** and **Db** exhibit nearly identical central values and error bar ranges. Similarly, **Ba** and **Da** are very close in performance.
### Interpretation
The chart suggests that for the OneMinMax problem with n=200, the choice of algorithm significantly impacts the resulting Pareto front size ratio. The data demonstrates a clear performance advantage for the "a" variants (Ca, Ba, Da) over their "b" counterparts (Cb, Bb, Db) within the same letter group (C, B, D), with the exception of the 'A' group where Aa outperforms Ab.
The tight clustering of results for Bb/Db and Ba/Da might indicate that these algorithm pairs share similar underlying mechanisms or are similarly affected by the problem structure. The consistent error bar size suggests that the measured performance variability is a stable characteristic of these algorithms on this problem, rather than an artifact of a specific run.
**Notable Anomaly:** The algorithm **Ab** is a clear outlier on the low end, performing substantially worse than all others. This could indicate a fundamental mismatch between this algorithm's approach and the demands of the OneMinMax problem at this scale. Conversely, **Ca**'s top performance marks it as the most effective method among those tested for maximizing the Pareto front size ratio in this specific context.
</details>
Figure 3: Ratios of the coverage of the Pareto front by the population of the different NSGA-II variants (using $A$ (selecting each individual as a parent once), $B$ ( $N$ times choosing a parent uniformly at random), $C$ (independent binary tournaments), or $D$ (two-permutation binary tournaments) as the mating selection strategy, and using $a$ (one-bit mutation) or $b$ (standard bit-wise mutation) as the mutation strategy) with population size $N=n+1$ on the OneMinMax with problem size $n=200$ . Displayed are the median (with $1$ st and $3$ rd quartiles) in 20 independent runs and $[2001..3000]$ generations.
### 6.4 Optimization With Small Population Sizes
In the previous subsection, we showed that the NSGA-II with population size equal to the size of the Pareto front cannot cover the full Pareto front in a reasonable time. On the positive side, however, still a large fraction of the Pareto front was covered, e.g., around 80% for the OneMinMax problem. This could indicate that the NSGA-II also with smaller population sizes is an interesting algorithm. This is what we briefly discuss now. We shall not explore this question in full detail, but only to the extent that we observe a good indication that the NSGA-II performs well also with small population sizes. We note that the subsequent work [ZD22a] took up this research question and discussed it in detail.
To understand how well the NSGA-II performs with small population size $n+1$ , we first regard how fast its population spreads out on the Pareto front. From the data in Figure 4, we see that also with this small population size, the NSGA-II quickly finds the two extremal points $(0,n)$ and $(n,0)$ of the Pareto front. This fits our understanding of the algorithms. Since the two outer-most individuals in the population have infinite crowding distance and since there are at most four individuals with infinite crowding distance, these individuals will never be lost, even if the population size is relatively small.
More interesting is the question how evenly the population is distributed on the Pareto front once the two extremal points are found. To this aim, we display in Figure 5 the function values of the populations after a moderate runtime in a run of the NSGA-II. In all eight datasets, the complete Pareto front was not found (as expected). However, the plots also show that in all cases, the front is well approximated by the population. Also, we note that the population contains only individuals on the Pareto front (which is trivially satisfied for OneMinMax, but not so for LeadingOnesTrailingZeroes). We note that the data from two individual runs displayed in the figure is representative. In all runs we never encountered an interval of uncovered points of length longer than $6$ and $4$ respectively.
<details>
<summary>x6.png Details</summary>

### Visual Description
\n
## Box Plot: NSGA-II with N=n+1 on OneMinMax
### Overview
The image displays a box plot chart titled "NSGA-II with N=n+1 on OneMinMax". It visualizes the distribution of the number of generations required for an algorithm (NSGA-II with population size N=n+1) to solve the "OneMinMax" problem for different problem sizes (n). The chart shows a clear positive correlation between the problem size `n` and the computational effort (generations) required.
### Components/Axes
* **Chart Title:** "NSGA-II with N=n+1 on OneMinMax" (Top Center).
* **Y-Axis Label:** "Generations to reach both (0,n) and (n,0)" (Left side, vertical text). This indicates the metric is the number of generations needed to find both optimal solutions for the bi-objective problem.
* **X-Axis Label:** "n" (Bottom Center). Represents the problem size parameter.
* **X-Axis Categories/Ticks:** Four discrete values: `100`, `200`, `300`, `400`.
* **Plot Elements:** For each `n`, a standard box-and-whisker plot is shown.
* **Blue Box:** Represents the interquartile range (IQR), from the 25th percentile (Q1) to the 75th percentile (Q3).
* **Red Line:** Inside each box marks the median (50th percentile).
* **Black Dashed Whiskers:** Extend from the box to the minimum and maximum data points within 1.5 * IQR.
* **Red Plus Signs (+):** Indicate individual outlier data points beyond the whiskers.
### Detailed Analysis
The following table summarizes the approximate key values extracted from each box plot. Values are estimated from the visual chart and carry inherent uncertainty.
| n | Median (Red Line) | IQR (Blue Box) Range (Approx.) | Full Range (Whiskers, Approx.) | Outlier(s) (Red +) |
| :-- | :---------------- | :----------------------------- | :----------------------------- | :----------------- |
| 100 | ~300 | 250 to 350 | 200 to 400 | ~600 |
| 200 | ~700 | 600 to 900 | 500 to 1200 | None visible |
| 300 | ~1200 | 1000 to 1400 | 800 to 1600 | ~1800 |
| 400 | ~1600 | 1500 to 2000 | 1000 to 2200 | None visible |
**Trend Verification:**
* **Visual Trend:** For each successive data series (from n=100 to n=400), the entire box plot (median, IQR, and whiskers) shifts upward on the y-axis. This confirms a strong, positive, monotonic trend: as `n` increases, the number of generations required increases significantly.
* **Spread Trend:** The vertical size of the blue boxes (IQR) and the length of the whiskers generally increase with `n`, indicating greater variability in the algorithm's performance for larger problem sizes.
### Key Observations
1. **Clear Scaling Relationship:** The median generations required scales approximately linearly with `n`. The median roughly doubles when `n` doubles from 100 to 200 (~300 to ~700), and again from 200 to 400 (~700 to ~1600).
2. **Increasing Variability:** The interquartile range (IQR) widens as `n` increases. For n=100, the IQR spans ~100 generations. For n=400, it spans ~500 generations. This suggests the algorithm's runtime becomes less predictable for larger problems.
3. **Presence of Outliers:** High-value outliers are present for n=100 (~600) and n=300 (~1800). These represent specific runs where the algorithm took significantly longer than typical. No outliers are visible for n=200 or n=400 within the plotted range.
4. **Minimum Performance Floor:** Even the fastest runs (bottom whisker) require more generations as `n` grows. The minimum observed generations for n=400 (~1000) is higher than the *median* for n=200 (~700).
### Interpretation
This chart demonstrates the **computational complexity** of the NSGA-II algorithm when applied to the OneMinMax problem with a population size scaled as N=n+1. The data suggests:
* **Problem Difficulty Scales with `n`:** The OneMinMax problem becomes harder for NSGA-II to solve as the problem dimension `n` increases, requiring more generations on average. The near-linear scaling of the median is a key performance characteristic.
* **Algorithmic Stability Decreases:** The increasing spread (IQR) indicates that for larger `n`, the algorithm's performance is more sensitive to initial conditions or stochastic elements, leading to a wider range of possible runtimes.
* **Practical Implication:** For a practitioner, this means that solving larger instances of this problem (e.g., n=400) will not only take longer on average but will also have a less predictable completion time. Planning for resources would require considering the upper quartile or whisker values, not just the median.
* **Underlying Pattern:** The relationship visualized here is a direct empirical measurement of the algorithm's time complexity for this specific problem configuration. It provides concrete data to compare against theoretical models or other algorithms.
</details>
(a)
<details>
<summary>x7.png Details</summary>

### Visual Description
\n
## Box Plot: NSGA-II with N=n+1 on LOTZ
### Overview
This image is a box plot chart illustrating the performance of the NSGA-II (Non-dominated Sorting Genetic Algorithm II) on the LOTZ (Leading Ones, Trailing Zeros) multi-objective optimization problem. The chart specifically measures the number of generations required for the algorithm to find both optimal points (0, n) and (n, 0) as the problem size `n` increases. The title "NSGA-II with N=n+1" suggests the population size `N` was set to `n+1` for each run.
### Components/Axes
* **Chart Title:** "NSGA-II with N=n+1 on LOTZ" (Top center).
* **X-Axis:**
* **Label:** "n" (Bottom center).
* **Categories/Markers:** Four discrete values: 30, 60, 90, 120.
* **Y-Axis:**
* **Label:** "Generations to reach both (0,n) and (n,0)" (Left side, rotated vertically).
* **Scale:** Linear scale from 0 to 5000, with major tick marks at intervals of 1000 (0, 1000, 2000, 3000, 4000, 5000).
* **Legend:** Not present. The chart contains a single data series represented by box plots.
* **Data Series:** Four box plots, one for each value of `n`. Each box plot is blue with a red median line. Outliers are marked with red '+' symbols.
### Detailed Analysis
The chart displays the distribution of the number of generations (a measure of computational effort) over multiple runs of the algorithm for each problem size `n`.
**For n = 30:**
* **Trend:** The distribution is very low and compact.
* **Data Points (Approximate):**
* Median (red line): ~200 generations.
* Interquartile Range (IQR, blue box): ~100 to ~300 generations.
* Whiskers (black dashed lines): Extend from ~50 to ~500 generations.
* Outliers: None visible.
**For n = 60:**
* **Trend:** The distribution shifts upward and spreads out compared to n=30.
* **Data Points (Approximate):**
* Median: ~900 generations.
* IQR: ~750 to ~1050 generations.
* Whiskers: Extend from ~500 to ~1250 generations.
* Outliers: None visible.
**For n = 90:**
* **Trend:** The distribution continues to shift upward and spread further.
* **Data Points (Approximate):**
* Median: ~1950 generations.
* IQR: ~1750 to ~2150 generations.
* Whiskers: Extend from ~1500 to ~2500 generations.
* Outliers: One outlier at approximately 2900 generations (red '+' above the top whisker).
**For n = 120:**
* **Trend:** The distribution shows the highest median and the largest spread.
* **Data Points (Approximate):**
* Median: ~3200 generations.
* IQR: ~2900 to ~3400 generations.
* Whiskers: Extend from ~2500 to ~4000 generations.
* Outliers: Two outliers visible. One at approximately 1900 generations (below the bottom whisker) and one at approximately 4800 generations (above the top whisker).
### Key Observations
1. **Clear Positive Correlation:** There is a strong, positive, and seemingly non-linear relationship between the problem size `n` and the median number of generations required. As `n` increases, the computational effort increases substantially.
2. **Increasing Variance:** The spread of the data (IQR and whisker length) increases with `n`. This indicates that the algorithm's performance becomes more variable and less predictable for larger problem instances.
3. **Presence of Outliers:** Outliers appear at `n=90` and `n=120`, suggesting that while most runs follow a trend, occasional runs can be significantly faster or slower than typical.
4. **Scalability Indicator:** The plot visually demonstrates the scalability challenge of the NSGA-II algorithm (with population size N=n+1) on the LOTZ problem. The growth in required generations appears to be super-linear.
### Interpretation
This box plot provides a statistical summary of algorithmic performance, moving beyond simple averages to show the distribution of outcomes. The data suggests that the LOTZ problem becomes exponentially harder for the NSGA-II algorithm as the problem dimension `n` grows. The increasing median reflects the growing difficulty, while the increasing variance indicates that the algorithm's reliability decreases with scale—some runs get "lucky" and find the optima quickly (lower outliers), while others struggle significantly (upper outliers).
From a Peircean investigative perspective, this chart is an **index** of computational cost. It points directly to the relationship between problem scale and resource consumption. The **iconic** representation (the box plots) allows for immediate visual comparison of distributions. The underlying **symbolic** knowledge (understanding NSGA-II and LOTZ) is required to interpret *why* this trend exists: the search space grows combinatorially with `n`, making it harder for the algorithm to locate the specific, disconnected optimal points (0,n) and (n,0).
**Notable Anomaly:** The outlier at ~1900 generations for `n=120` is particularly interesting. It represents a run that performed as well as the *median* run for `n=90`, despite the problem being larger. This could be due to favorable random initialization or a lucky sequence of genetic operations, highlighting the stochastic nature of the algorithm.
</details>
(b)
Figure 4: First generation when both extreme function values $(0,n)$ and $(n,0)$ were contained in the population of the NSGA-II (binary tournament selection, standard bit-wise mutation, population size $N=n+1$ ) for OneMinMax (4a) and LeadingOnesTrailingZeroes (4b).
<details>
<summary>x8.png Details</summary>

### Visual Description
\n
## Line Chart: NSGA-II with N=n+1 on OneMinMax
### Overview
The image displays a 2D line chart titled "NSGA-II with N=n+1 on OneMinMax". It plots four distinct, straight lines on a Cartesian coordinate system, each representing a different linear relationship between two variables, f₁ and f₂. The chart likely visualizes Pareto fronts or trade-off curves for a multi-objective optimization problem.
### Components/Axes
* **Chart Title:** "NSGA-II with N=n+1 on OneMinMax" (located at the top center).
* **X-Axis:**
* **Label:** "f₁" (located below the axis, centered).
* **Scale:** Linear scale from 0 to 400.
* **Major Tick Marks:** 0, 100, 200, 300, 400.
* **Y-Axis:**
* **Label:** "f₂" (located to the left of the axis, rotated vertically).
* **Scale:** Linear scale from 0 to 400.
* **Major Tick Marks:** 0, 50, 100, 150, 200, 250, 300, 350, 400.
* **Data Series (Lines):** Four distinct colored lines are present. There is no explicit legend box, but the lines are differentiated by color and their intercepts.
1. **Red Line:** Starts at (0, 400) and ends at (400, 0).
2. **Green Line:** Starts at (0, 300) and ends at (300, 0).
3. **Yellow/Gold Line:** Starts at (0, 200) and ends at (200, 0).
4. **Black Line:** Starts at (0, 100) and ends at (100, 0).
### Detailed Analysis
* **Trend Verification:** All four lines exhibit a perfect, negative linear trend. They slope downward from left to right at a constant angle.
* **Data Points & Equations:**
* **Red Line:** Connects the points (0, 400) and (400, 0). The equation is f₂ = -f₁ + 400.
* **Green Line:** Connects the points (0, 300) and (300, 0). The equation is f₂ = -f₁ + 300.
* **Yellow Line:** Connects the points (0, 200) and (200, 0). The equation is f₂ = -f₁ + 200.
* **Black Line:** Connects the points (0, 100) and (100, 0). The equation is f₂ = -f₁ + 100.
* **Spatial Relationships:** The lines are parallel to each other. They are stacked vertically, with the red line being the outermost (highest intercepts) and the black line being the innermost (lowest intercepts). The vertical distance between consecutive lines is constant at 100 units on the f₂ axis.
### Key Observations
1. **Perfect Linearity:** All relationships are perfectly linear with a slope of -1.
2. **Constant Offset:** The lines are offset from each other by a constant value (100) in their y-intercepts.
3. **Bounded Domain:** Each line exists only within the square domain where both f₁ and f₂ are non-negative (f₁ ≥ 0, f₂ ≥ 0). The lines terminate at the axes.
4. **No Overlap:** The lines do not intersect within the plotted area.
### Interpretation
This chart visualizes the solution space for a bi-objective optimization problem, likely the "OneMinMax" problem, solved using the NSGA-II algorithm with a population size of N=n+1.
* **What the data suggests:** The four lines represent four distinct, optimal trade-off fronts (Pareto fronts). Each front corresponds to a different level of total "cost" or "sum of objectives," where the sum f₁ + f₂ is constant for a given line (400, 300, 200, and 100, respectively).
* **How elements relate:** The negative slope (-1) indicates a perfect, one-to-one trade-off between the two objectives f₁ and f₂. Improving one objective (e.g., decreasing f₁) requires an equal worsening of the other (increasing f₂) to remain on the same optimal front. The different lines show that solutions exist at different overall performance levels; the black line (sum=100) represents better overall solutions than the red line (sum=400).
* **Notable Patterns/Anomalies:** The perfect linearity and equal spacing are highly structured. This is characteristic of benchmark problems like OneMinMax, where the Pareto front is known to be a straight line. The presence of multiple, parallel fronts suggests the algorithm has found solutions at different convergence levels or that the problem has a layered structure of optimal solutions. The chart effectively shows the algorithm's ability to discover the entire spectrum of trade-offs, from the worst (red) to the best (black) found fronts.
</details>
(a)
<details>
<summary>x9.png Details</summary>

### Visual Description
\n
## Scatter Plot: NSGA-II with N=n+1 on LOTZ
### Overview
The image is a 2D scatter plot titled "NSGA-II with N=n+1 on LOTZ." It displays four distinct, parallel, linear series of data points plotted on a Cartesian coordinate system. Each series is represented by a different color and forms a straight line sloping downward from the upper-left to the lower-right quadrant. The plot appears to visualize the results of a multi-objective optimization algorithm (NSGA-II) applied to the LOTZ (Leading Ones Trailing Zeros) benchmark problem, showing trade-offs between two objective functions, f₁ and f₂.
### Components/Axes
* **Title:** "NSGA-II with N=n+1 on LOTZ" (centered at the top).
* **X-Axis:**
* **Label:** "f₁" (centered below the axis).
* **Scale:** Linear, ranging from 0 to 120.
* **Major Tick Marks:** At intervals of 20 (0, 20, 40, 60, 80, 100, 120).
* **Y-Axis:**
* **Label:** "f₂" (centered to the left of the axis, rotated 90 degrees).
* **Scale:** Linear, ranging from 0 to 120.
* **Major Tick Marks:** At intervals of 20 (0, 20, 40, 60, 80, 100, 120).
* **Data Series (Legend inferred from color):** There is no explicit legend box. The series are distinguished solely by color and their position on the plot.
1. **Red Series:** The outermost line, positioned furthest from the origin.
2. **Green Series:** The second line from the outside.
3. **Yellow Series:** The third line from the outside.
4. **Black Series:** The innermost line, positioned closest to the origin.
### Detailed Analysis
* **Trend Verification:** All four data series exhibit an identical visual trend: a perfect, negative linear relationship. Each line slopes downward at a constant angle from left to right.
* **Data Point Extraction (Approximate):** Each series consists of evenly spaced dots forming a line. The endpoints define the line's equation.
* **Red Series:** Starts near (f₁=0, f₂≈120) and ends near (f₁≈120, f₂≈0). The line appears to follow the equation f₁ + f₂ ≈ 120.
* **Green Series:** Starts near (f₁=0, f₂≈90) and ends near (f₁≈90, f₂≈0). The line appears to follow the equation f₁ + f₂ ≈ 90.
* **Yellow Series:** Starts near (f₁=0, f₂≈60) and ends near (f₁≈60, f₂≈0). The line appears to follow the equation f₁ + f₂ ≈ 60.
* **Black Series:** Starts near (f₁=0, f₂≈30) and ends near (f₁≈30, f₂≈0). The line appears to follow the equation f₁ + f₂ ≈ 30.
* **Spatial Grounding:** The lines are parallel and evenly spaced. The vertical (or horizontal) distance between consecutive lines is approximately 30 units on the f₂ (or f₁) axis. The legend (color coding) is intrinsically linked to the line's distance from the origin, with red being the farthest and black the closest.
### Key Observations
1. **Perfect Linearity:** The data points for each series form perfectly straight lines, indicating a strict, linear trade-off between objectives f₁ and f₂ for each run or configuration represented by a color.
2. **Parallel Fronts:** The four lines are parallel, suggesting the underlying relationship between f₁ and f₂ is consistent across different scenarios, only shifted in magnitude.
3. **Discrete Levels:** The constant offset of ~30 units between lines indicates the algorithm found solutions lying on distinct, discrete Pareto fronts, rather than a continuous spread.
4. **Boundary Coverage:** Each line spans from one axis to the other, showing the algorithm found solutions that fully explore the trade-off from maximizing f₁ (minimizing f₂) to maximizing f₂ (minimizing f₁) for each front.
### Interpretation
This plot visualizes the output of the NSGA-II evolutionary algorithm on the multi-objective LOTZ problem, configured with a population size N = n+1 (where 'n' is likely the problem dimension). The f₁ and f₂ axes represent the two conflicting objectives to be minimized.
The four parallel, linear fronts are characteristic of the LOTZ problem's known Pareto-optimal set structure. The key insight is that the algorithm has successfully identified **multiple, distinct, and evenly-spaced approximation sets** of the true Pareto front. Each colored line represents a different approximation front, likely corresponding to different runs, different parameter settings, or successive generations. The fact that the fronts are perfectly linear and parallel confirms the algorithm is correctly capturing the problem's geometry. The discrete spacing (≈30 units) between fronts may be an artifact of the problem's integer-based nature or the specific `N=n+1` population setting, which might constrain the algorithm to find solutions at specific intervals along the continuous theoretical front. The plot demonstrates NSGA-II's ability to maintain diversity and converge to multiple, well-distributed Pareto-optimal fronts for this benchmark problem.
</details>
(b)
Figure 5: The function values of the population $P_t$ for $t=3000$ when optimizing OneMinMax (5a) and for $t=5000$ when optimizing LeadingOnesTrailingZeroes (5b) via the NSGA-II (binary tournament selection, standard bit-wise mutation, population size $N=n+1$ ) in one typical run. Both plots show that this population size is not sufficient to completely cover the Pareto front, but it suffices to approximate very well the front. Different colors are for different problem sizes $n$ , and $n=\{100,200,300,400\}$ for OneMinMax and $n=\{30,60,90,120\}$ for LeadingOnesTrailingZeroes. Also note that the Pareto front is $\{(i,n-i)\mid i∈[0..n]\}$ .
## 7 Conclusion
In this work, we conducted the first mathematical runtime analysis of the NSGA-II, which is the predominant framework in real-world multi-objective optimization. We proved that with a suitable population size, all variants of the NSGA-II regarded in this work satisfy the same asymptotic runtime guarantees as the previously regarded much simpler SEMO, GSEMO, and $(μ+1)$ SIBEA when optimizing the two benchmarks OneMinMax and LeadingOnesTrailingZeroes. The choice of the population size is important. We proved an exponential runtime when the population size equals the size of the Pareto front.
On the technical side, this paper shows that mathematical runtime analyses are feasible also for the NSGA-II. We provided a number of arguments to cope with the challenges imposed by this algorithm, in particular, the fact that points in the Pareto front can be lost and the parent selection via binary tournaments based on the rank and crowding distance. We are optimistic that these tools will aid future analyses of the NSGA-II (and in fact, they have already been used several times in subsequent work, see the discussion in the introduction).
## Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant No. 62306086), Science, Technology and Innovation Commission of Shenzhen Municipality (Grant No. GXWD20220818191018001), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177).
This work was also supported by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH.
## References
- [AD11] Anne Auger and Benjamin Doerr, editors. Theory of Randomized Search Heuristics. World Scientific Publishing, 2011.
- [BFN08] Dimo Brockhoff, Tobias Friedrich, and Frank Neumann. Analyzing hypervolume indicator based algorithms. In Parallel Problem Solving from Nature, PPSN 2008, pages 651–660. Springer, 2008.
- [BFQY20] Chao Bian, Chao Feng, Chao Qian, and Yang Yu. An efficient evolutionary algorithm for subset selection with general cost constraints. In Conference on Artificial Intelligence, AAAI 2020, pages 3267–3274. AAAI Press, 2020.
- [BNE07] Nicola Beume, Boris Naujoks, and Michael Emmerich. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181:1653–1669, 2007.
- [BQ22] Chao Bian and Chao Qian. Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection. In Parallel Problem Solving From Nature, PPSN 2022, pages 428–441. Springer, 2022.
- [BQT18] Chao Bian, Chao Qian, and Ke Tang. A general approach to running time analysis of multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2018, pages 1405–1411. IJCAI, 2018.
- [BZLQ23] Chao Bian, Yawen Zhou, Miqing Li, and Chao Qian. Stochastic population update can provably be helpful in multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5513–5521. ijcai.org, 2023.
- [CDH ${}^+$ 23] Sacha Cerf, Benjamin Doerr, Benjamin Hebras, Jakob Kahane, and Simon Wietheger. The first proven performance guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a combinatorial optimization problem. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5522–5530. ijcai.org, 2023.
- [COGNS20] Edgar Covantes Osuna, Wanru Gao, Frank Neumann, and Dirk Sudholt. Design and analysis of diversity-based parent selection schemes for speeding up evolutionary multi-objective optimisation. Theoretical Computer Science, 832:123–142, 2020.
- [Cra19] Victoria G. Crawford. An efficient evolutionary algorithm for minimum cost submodular cover. In International Joint Conference on Artificial Intelligence, IJCAI 2019, pages 1227–1233. ijcai.org, 2019.
- [Cra21] Victoria G. Crawford. Faster guarantees of evolutionary algorithms for maximization of monotone submodular functions. In International Joint Conference on Artificial Intelligence, IJCAI 2021, pages 1661–1667. ijcai.org, 2021.
- [DD18] Benjamin Doerr and Carola Doerr. Optimal static and self-adjusting parameter choices for the ${(1+(λ,λ))}$ genetic algorithm. Algorithmica, 80:1658–1709, 2018.
- [DDN ${}^+$ 20] Benjamin Doerr, Carola Doerr, Aneta Neumann, Frank Neumann, and Andrew M. Sutton. Optimization of chance-constrained submodular functions. In Conference on Artificial Intelligence, AAAI 2020, pages 1460–1467. AAAI Press, 2020.
- [Deb] Kalyanmoy Deb’s implementation of the NSGA-II. https://www.egr.msu.edu/~kdeb/codes.shtml.
- [DGN16] Benjamin Doerr, Wanru Gao, and Frank Neumann. Runtime analysis of evolutionary diversity maximization for OneMinMax. In Genetic and Evolutionary Computation Conference, GECCO 2016, pages 557–564. ACM, 2016.
- [DN20] Benjamin Doerr and Frank Neumann, editors. Theory of Evolutionary Computation—Recent Developments in Discrete Optimization. Springer, 2020. Also available at http://www.lix.polytechnique.fr/Labo/Benjamin.Doerr/doerr_neumann_book.html.
- [Doe19] Benjamin Doerr. Analyzing randomized search heuristics via stochastic domination. Theoretical Computer Science, 773:115–137, 2019.
- [Doe20] Benjamin Doerr. Probabilistic tools for the analysis of randomized optimization heuristics. In Benjamin Doerr and Frank Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 1–87. Springer, 2020. Also available at https://arxiv.org/abs/1801.06733.
- [DOSS23a] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. Analysing the robustness of NSGA-II under noise. In Genetic and Evolutionary Computation Conference, GECCO 2023, pages 642–651. ACM, 2023.
- [DOSS23b] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. A proof that using crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation. In Conference on Artificial Intelligence, AAAI 2023, pages 12390–12398. AAAI Press, 2023.
- [DPAM02] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002.
- [DQ23a] Benjamin Doerr and Zhongdi Qu. A first runtime analysis of the NSGA-II on a multimodal problem. Transactions on Evolutionary Computation, 2023. https://doi.org/10.1109/TEVC.2023.3250552.
- [DQ23b] Benjamin Doerr and Zhongdi Qu. From understanding the population dynamics of the NSGA-II to the first proven lower bounds. In Conference on Artificial Intelligence, AAAI 2023, pages 12408–12416. AAAI Press, 2023.
- [DQ23c] Benjamin Doerr and Zhongdi Qu. Runtime analysis for the NSGA-II: provable speed-ups from crossover. In Conference on Artificial Intelligence, AAAI 2023, pages 12399–12407. AAAI Press, 2023.
- [DZ21] Benjamin Doerr and Weijie Zheng. Theoretical analyses of multi-objective evolutionary algorithms on multi-modal objectives. In Conference on Artificial Intelligence, AAAI 2021, pages 12293–12301. AAAI Press, 2021.
- [FHH ${}^+$ 10] Tobias Friedrich, Jun He, Nils Hebbinghaus, Frank Neumann, and Carsten Witt. Approximating covering problems by randomized search heuristics using multi-objective models. Evolutionary Computation, 18:617–633, 2010.
- [FN15] Tobias Friedrich and Frank Neumann. Maximizing submodular functions under matroid constraints by evolutionary algorithms. Evolutionary Computation, 23:543–558, 2015.
- [GD90] David E. Goldberg and Kalyanmoy Deb. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms, FOGA 1990, pages 69–93. Morgan Kaufmann, 1990.
- [Gie03] Oliver Giel. Expected runtimes of a simple multi-objective evolutionary algorithm. In Congress on Evolutionary Computation, CEC 2003, pages 1918–1925. IEEE, 2003.
- [GL10] Oliver Giel and Per Kristian Lehre. On the effect of populations in evolutionary multi-objective optimisation. Evolutionary Computation, 18:335–356, 2010.
- [HZ20] Zhengxin Huang and Yuren Zhou. Runtime analysis of somatic contiguous hypermutation operators in MOEA/D framework. In Conference on Artificial Intelligence, AAAI 2020, pages 2359–2366. AAAI Press, 2020.
- [HZCH19] Zhengxin Huang, Yuren Zhou, Zefeng Chen, and Xiaoyu He. Running time analysis of MOEA/D with crossover on discrete optimization problem. In Conference on Artificial Intelligence, AAAI 2019, pages 2296–2303. AAAI Press, 2019.
- [Jan13] Thomas Jansen. Analyzing Evolutionary Algorithms – The Computer Science Perspective. Springer, 2013.
- [KD06] Saku Kukkonen and Kalyanmoy Deb. Improved pruning of non-dominated solutions based on crowding distance for bi-objective optimization problems. In Conference on Evolutionary Computation, CEC 2006, pages 1179–1186. IEEE, 2006.
- [LTZ ${}^+$ 02] Marco Laumanns, Lothar Thiele, Eckart Zitzler, Emo Welzl, and Kalyanmoy Deb. Running time analysis of multi-objective evolutionary algorithms on a simple discrete optimization problem. In Parallel Problem Solving from Nature, PPSN 2002, pages 44–53. Springer, 2002.
- [LTZ04] Marco Laumanns, Lothar Thiele, and Eckart Zitzler. Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions. IEEE Transactions on Evolutionary Computation, 8:170–182, 2004.
- [LZZZ16] Yuan-Long Li, Yu-Ren Zhou, Zhi-Hui Zhan, and Jun Zhang. A primary theoretical study on decomposition-based multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 20:563–576, 2016.
- [McD89] Colin McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 48–118. Cambridge Univ. Press, 1989.
- [Neu07] Frank Neumann. Expected runtimes of a simple evolutionary algorithm for the multi-objective minimum spanning tree problem. European Journal of Operational Research, 181:1620–1629, 2007.
- [NRS11] Frank Neumann, Joachim Reichel, and Martin Skutella. Computing minimum cuts by randomized search heuristics. Algorithmica, 59:323–342, 2011.
- [NSN15] Anh Quang Nguyen, Andrew M. Sutton, and Frank Neumann. Population size matters: rigorous runtime results for maximizing the hypervolume indicator. Theoretical Computer Science, 561:24–36, 2015.
- [NT10] Frank Neumann and Madeleine Theile. How crossover speeds up evolutionary algorithms for the multi-criteria all-pairs-shortest-path problem. In Parallel Problem Solving from Nature, PPSN 2010, Part I, pages 667–676. Springer, 2010.
- [NW06] Frank Neumann and Carsten Witt. Runtime analysis of a simple ant colony optimization algorithm. In Algorithms and Computation, ISAAC 2006, pages 618–627. Springer, 2006.
- [NW10] Frank Neumann and Carsten Witt. Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity. Springer, 2010.
- [NW22] Frank Neumann and Carsten Witt. Runtime analysis of single- and multi-objective evolutionary algorithms for chance constrained optimization problems with normally distributed random variables. In International Joint Conference on Artificial Intelligence, IJCAI 2022, pages 4800–4806. ijcai.org, 2022.
- [QBF20] Chao Qian, Chao Bian, and Chao Feng. Subset selection by Pareto optimization with recombination. In Conference on Artificial Intelligence, AAAI 2020, pages 2408–2415. AAAI Press, 2020.
- [QSYT17] Chao Qian, Jing-Cheng Shi, Yang Yu, and Ke Tang. On subset selection with general cost constraints. In International Joint Conference on Artificial Intelligence, IJCAI 2017, pages 2613–2619. ijcai.org, 2017.
- [QYT ${}^+$ 19] Chao Qian, Yang Yu, Ke Tang, Xin Yao, and Zhi-Hua Zhou. Maximizing submodular or monotone approximately submodular functions by multi-objective evolutionary algorithms. Artificial Intelligence, 275:279–294, 2019.
- [QYZ13] Chao Qian, Yang Yu, and Zhi-Hua Zhou. An analysis on recombination in multi-objective evolutionary optimization. Artificial Intelligence, 204:99–119, 2013.
- [QYZ15] Chao Qian, Yang Yu, and Zhi-Hua Zhou. On constrained Boolean Pareto optimization. In International Joint Conference on Artificial Intelligence, IJCAI 2015, pages 389–395. AAAI Press, 2015.
- [RNNF19] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich. Pareto optimization for subset selection with dynamic cost constraints. In Conference on Artificial Intelligence, AAAI 2019, pages 2354–2361. AAAI Press, 2019.
- [Rud98] Günter Rudolph. Evolutionary search for minimal elements in partially ordered finite sets. In Evolutionary Programming, EP 1998, pages 345–353. Springer, 1998.
- [WD23] Simon Wietheger and Benjamin Doerr. A mathematical runtime analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III). In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5657–5665. ijcai.org, 2023.
- [ZD22a] Weijie Zheng and Benjamin Doerr. Better approximation guarantees for the NSGA-II by using the current crowding distance. In Genetic and Evolutionary Computation Conference, GECCO 2022, pages 611–619. ACM, 2022.
- [ZD22b] Weijie Zheng and Benjamin Doerr. Runtime analysis for the NSGA-II: proving, quantifying, and explaining the inefficiency for three or more objectives. CoRR, abs/2211.13084, 2022.
- [Zhe] Implementation of the NSGA-II in this paper. https://github.com/zhengwj13/NSGA_II_Clean.
- [ZLD22] Weijie Zheng, Yufei Liu, and Benjamin Doerr. A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In Conference on Artificial Intelligence, AAAI 2022, pages 10408–10416. AAAI Press, 2022.
- [ZQL ${}^+$ 11] Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1:32–49, 2011.